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O (54) Title: SYNTHETIC PEPTIDES AND USES THEREFORE 
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^> (57) Abstract: A synthetic polypeptide is disclosed, which comprises a plurality of different segments of at least one parent polypep- 
tide, wherein the segments are linked together in a different relationship relative to their linkage in the at least one parent polypeptide 
to impede, abrogate or otherwise alter at least one function associated with the parent polypeptide. Synthetic polynucleotides are 

Q also disclosed that code for the synthetic polypeptides of the invention as well as expression constructs comprising the synthetic 

^ polynucleotides. Also disclosed are methods for constructing the aforementioned molecules and immunopotentiating compositions 

^ and methods for treating and/or preventing a disease or condition. 
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SYNTHETIC PEPTIDES AND USES THEREFORE 
FIELD OF THE INVENTION 

THIS INVENTION relates generally to agents for modulating immune responses. 
More particularly, the present invention relates to a synthetic polypeptide comprising a 
5 plurality of different segments of a parent polypeptide, wherein the segments are linked to 
each other such that one or more functions of the parent polypeptide are impeded, 
abrogated or otherwise altered and such that the synthetic polypeptide, when introduced 
into a suitable host, can elicit an immune response against the parent polypeptide. The 
invention also relates to synthetic polynucleotides encoding the synthetic polypeptides and 
10 to synthetic constructs comprising these polynucleotides. The invention further relates to 
the use of the polypeptides and polynucleotides of the invention in compositions for 
modulating immune responses. The invention also extends to methods of using such 
compositions for prophylactic and/or therapeutic purposes. 

Bibliographic details of various publications referred to in this specification are 
1 5 collected at the aid of the description. 

BACKGROUND OF THE INVENTION 

The modern reductionist approach to vaccine and therapy development has been 
pursued for a number of decades and attempts to focus only on those parts of pathogens or 
of cancer proteins which are relevant to the immune system. To date the performance of 
20 this approach has been relatively poor considering the vigorous research carried out and 
the number of effective vaccines and therapies that it has produced. This approach is still 
being actively pursued, however, despite its poor performance because vaccines developed 
using this approach are often extremely safe and because only by completely 
understanding the immune system can new vaccine strategies be developed. 

25 One area that has benefited greatly from research efforts is knowledge about how 

the adaptive immune system operates and more specifically how T and B cells learn to 
recognise specific parts of pathogens and cancers. T cells are mainly involved in cell- 
mediated immunity whereas B cells are involved in the generation of antibody-mediated 
immunity. The two most important types of T cells involved in adaptive cellular immunity 
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are <x{} CD8 + cytotoxic T lymphocytes (CTL) and CD4 + T helper lymphocytes. CTL are 
important mediators of cellular immunity against many viruses, tumours, some bacteria 
and some parasites because they are able to kill infected cells directly and secrete various 
factors which can have powerful effects on the spread of infectious organisms. CTLs 
5 recognise epitopes derived from foreign intracellular proteins, which are 8-10 amino acids 
long and which are presented by class I major histocompatibility complex (MHC) 
molecules (in humans called human lymphocyte antigens - HLAs) (Jardetzky et al, 1991; 
Fremont et al, 1992; Rotzschke et al., 1990). T helper cells enhance and regulate CTL 
responses and are necessary for the establishment of long-lived memory CTL. They also 

10 inhibit infectious organisms by secreting cytokines such as IFN-y. T helper cells recognise 
epitopes derived mostly from extracellular proteins which are 12-25 amino acids long and 
which are presented by class II MHC molecules (Chicz et al. y 1993; Newcomb et al, 
1993). B cells, or more specifically the antibodies they secrete, are important mediators in 
the control and clearance of mostly extracellular organisms. Antibodies recognise mainly 

15 conformational determinants on the surface of organisms, for example, although 
sometimes they may recognise short linear determinants. 

Despite significant advances towards understanding how T and linear B cell 
epitopes are processed and presented to the immune system, the full potential of epitope- 
based vaccines has not been fully exploited. The main reason for this is the large number 

20 of different T cell epitopes, which have to be included into such vaccines to cover the 
extreme HLA polymorphism in the human population. The human HLA diversity is one of 
the main reasons why whole pathogen vaccines frequently provide better population 
coverage than subunit or peptide-based vaccine strategies. There is a range of epitope- 
based strategies though which have tried to solve this problem, e.g., peptide blends, peptide 

25 conjugates and polyepitope vaccines (ie comprising strings of multiple epitopes) (Dyall et 
al 9 1995; Thomson et al, 1996; Thomson et al, 1998; Thomson et al, 1998). These 
approaches however will always be sub optimal not only because of the slow pace of 
epitope characterisation but also, because it is virtually impossible for them to cover every 
existing HLA polymorphism in the population. A number of strategies have sought to 

30 avoid both problems by not identifying epitopes and instead incorporating larger amounts 
of sequence information e.g., approaches using whole genes or proteins and approaches 
that mix multiple protein or gene sequences together. The proteins used by these strategies 
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however sometimes still function and therefore can compromise vaccine safety e.g., whole 
cancer proteins. Alternative strategies have tried to improve the safety of vaccines by 
fragmenting the genes and expressing them either separately or as complex mixtures e.g., 
library DNA immunisation or by Hgating such fragments back together. These approaches 
5 are still sub-optimal because they are too complex, generate poor levels of immunity, 
cannot guarantee that all proteins no longer function and/or that all fragments are present, 
which compromises substantially complete immunological coverage. 

The lack of a safe and efficient vaccine strategy that can provide substantially 
complete immunological coverage is an important problem, especially when trying to 

10 develop vaccines against rapidly mutating and persistent viruses such as HIV and hepatitis 
C virus, because partial population coverage could allow vaccine-resistant pathogens to re- 
emerge in the future. Human immunodeficiency virus (HIV) is an RNA lenti virus virus 
approximately 9 kb in length, which infects CD4 + T cells, causing T cell decline and AIDS 
typically 3-8 years after infection. It is currently the most serious human viral infection, 

15 evidenced by the number of people currently infected with HIV or who have died from 
AIDS, estimated by the World Health Organisation (WHO) and UNAIDS in their AIDS 
epidemic update (December 1999) to be 33.6 and 16.3 million people, respectively. The 
spread of HTV is also now increasing fastest in areas of the world where over half of the 
human population reside, hence an effective vaccine is desperately needed to curb the 

20 spread of this epidemic. Despite the urgency, an effective vaccine for HTV is still some 
way off because of delays in defining the correlates of immune protection, lack of a 
suitable animal model, existence of up to 8 different subtypes of HIV and a high HIV 
mutation rate. 

A significant amount of research has been carried out to try and develop a vaccine 
25 capable of generating neutralising antibody responses that can protect against field isolates 
of HIV. Despite these efforts, it is now clear that the variability, instability and 
inaccessibility of critical determinants on the HIV envelope protein will make it extremely 
difficult and perhaps impossible to develop such a vaccine (Kwong et cd. y 1998). The 
limited ability of antibodies to block HIV infection is also supported by the observation 
30 that development of AIDS correlates primarily with a reduction in CTL responsiveness to 
HTV and not to altered antibody levels (Ogg et al. y 1998). Hence CTL-mediated and not 
antibody-mediated responses appear to be critical for maintaining the asymptomatic state 
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in vivo. There is also some evidence to suggest that pre-existing HIV-specific CTL 
responses can block the establishment of a latent HIV infection. This evidence comes from 
a number of cases where individuals have generated HIV-specific CTL responses without 
becoming infected and appear to be protected from establishing latent HTV infections 
5 despite repeated virus exposure (Rowland- Jones et aL, 1995; Parmiani 1998). Taken 
together, these observations suggest that a vaccine capable of generating a broad range of 
strong CTL responses may be able to stop individuals from becoming latently infected 
with HIV or at least allow infected individuals to remain asymptomatic for life. Virtually 
all of the candidate HIV vaccines developed to date have been derived from subtype B 

10 HIV proteins (western world subtype) whereas the majority of the HTV infections 
worldwide are caused by subtypes A/E or C (E and A are similar except in the envelop 
protein)(referred to as developing world subtypes). Hence existing candidate vaccines may 
not be suitable for the more common HTV subtypes. Recently, there has been some 
evidence that B subtype vaccines may be partially effective against other common HTV 

15 subtypes (Rowland- Jones et al 9 1998). Accordingly, the desirability of a vaccine still 
remains, whose effectiveness is substantially complete against all isolates of all strains of 
HIV. 
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SUMMARY OF THE INVENTION 

The present invention is predicated in part on a novel strategy for enhancing the 
efficacy of an immunopotentiating composition. This strategy involves utilising the 
sequence information of a parent polypeptide to produce a synthetic polypeptide that 
5 comprises a plurality of different segments of the parent polypeptide, which are linked 
sequentially together in a different arrangement relative to that of the parent polypeptide. 
As a result of this change in relationship, the sequence of the linked segments in the 
synthetic polypeptide is different to a sequence contained within the parent polypeptide. As 
more fully described hereinafter, the present strategy is used advantageously to cause 
10 significant disruption to the structure and/or function of the parent polypeptide while 
minimising the destruction of potentially useful epitopes encoded by the parent 
polypeptide. 

Thus, in one aspect of the present invention, there is provided a synthetic 
polypeptide comprising a plurality of different segments of at least one parent polypeptide, 
15 wherein the segments are linked together in a different relationship relative to their linkage 
in the at least one parent polypeptide. 

In one embodiment, the synthetic polypeptide consists essentially of different 
segments of a single parent polypeptide. 

In an alternate embodiment, the synthetic polypeptide consists essentially of 
20 different segments of a plurality of different parent polypeptides. 

Suitably, said segments in said synthetic polypeptide are linked sequentially in a 
different order or arrangement relative to that of corresponding segments in said at least 
one parent polypeptide. 

Preferably, at least one of said segments comprises partial sequence identity or 
25 homology to one or more other said segments. The sequence identity or homology is 
preferably contained at one or both ends of said at least one segment. 

In another aspect, the invention resides in a synthetic polynucleotide encoding the 
synthetic polypeptide as broadly described above. 
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According to yet another aspect, the invention contemplates a synthetic construct 
comprising a said polynucleotide as broadly described above that is operably linked to a 
regulatory polynucleotide. 

In a further aspect of the invention, there is provided a method for producing a 
5 synthetic polynucleotide as broadly described above, comprising: 

- linking together in the same reading frame a plurality of nucleic acid sequences 
encoding different segments of at least one parent polypeptide to form a synthetic 
polynucleotide whose sequence encodes said segments linked together in a different 
relationship relative to their linkage in the at least one parent polypeptide. 

10 Preferably, the method further comprises fragmenting the sequence of a respective 

parent polypeptide into fragments and linking said fragments together in a different 
relationship relative to their linkage in said parent polypeptide sequence. In a preferred 
embodiment of this type, the fragments are randomly linked together. 

Suitably, the method further comprises reverse translating the sequence of a 
15 respective parent polypeptide or a segment thereof to provide a nucleic acid sequence 
encoding said parent polypeptide or said segment. In a preferred embodiment of this type, 
an amino acid of said parent polypeptide sequence is reverse translated to provide a codon, 
which has higher translational efficiency than other synonymous codons in a cell of 
interest. Suitably, an amino acid of said parent polypeptide sequence is reverse translated 
20 to provide a codon which, in the context of adjacent or local sequence elements, has a 
lower propensity of forming an undesirable sequence (eg., a palindromic sequence or a 
duplicated sequence) that is refractory to the execution of a task (e.£., cloning or 
sequencing). 

In another aspect, the invention encompasses a computer program product for 
25 designing the sequence of a synthetic polypeptide as broadly described above, comprising: 

- code that receives as input the sequence of at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 
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- code that links together said fragments in a different relationship relative to their 
linkage in said parent polypeptide sequence; and 

- a computer readable medium that stores the codes. 

In yet another aspect, the invention provides a computer program product for 
5 designing the sequence of a synthetic polynucleotide as broadly described above, 
comprising: 

- code that receives as input the sequence of at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 

10 - code that reverse translates the sequence of a respective fragment to provide a 

nucleic acid sequence encoding said fragment; 

- code that links together in the same reading frame each said nucleic acid 
sequence to provide a polynucleotide sequence that codes for a polypeptide sequence in 
which said fragments are linked together in a different relationship relative to their 

15 linkage in the at least one parent polypeptide sequence; and 

- a computer readable medium that stores the codes. 

In still yet another aspect, the invention provides a computer for designing the 
sequence of a synthetic polypeptide as broadly described above, wherein said computer 
comprises: 

20 (a) a machine-readable data storage medium comprising a data storage material 

encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

25 (c) a central-processing unit coupled to said working memory and to said machine- 

readable data storage medium, for processing said machine readable data to provide said 
synthetic polypeptide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polypeptide sequence. 
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In a preferred embodiment, the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments and 
linking together said fragments in a different relationship relative to their linkage in the 
sequence of said parent polypeptide. 

5 In still yet another aspect, the invention resides in a computer for designing the 

sequence of a synthetic polynucleotide as broadly described above, wherein said computer 
comprises: 

(a) a machine-readable data storage medium comprising a data storage materia) 
encoded with machine-readable data, wherein said machine-readable data comprise the 

1 0 sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 

1 5 synthetic polynucleotide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polynucleotide sequence. 

In a preferred embodiment, the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments, 
20 reverse translating the sequence of a respective fragment to provide a nucleic acid 
sequence encoding said fragment and linking together in the same reading frame each said 
nucleic acid sequence to provide a polynucleotide sequence that codes for a polypeptide 
sequence in which said fragments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide sequence. 

25 According to another aspect, the invention contemplates a composition, 

comprising an immunopo tenti ating agent selected from the group consisting of a synthetic 
polypeptide as broadly described above, a synthetic polynucleotide as broadly described 
above and a synthetic construct as broadly described above, together with a 
pharmaceutical^ acceptable carrier. 
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The composition may optionally comprise an adjuvant. 

In a further aspect, the invention encompasses a method for modulating an 
immune response, which response is preferably directed against a pathogen or a cancer, 
comprising administering to a patient in need of such treatment an effective amount of an 
5 immunopotentiating agent selected from the group consisting of a synthetic polypeptide as 
broadly described above, a synthetic polynucleotide as broadly described above and a 
synthetic construct as broadly described above, or a composition as broadly described 
above. 

According to still a further aspect of the invention, there is provided a method for 
10 treatment and/or prophylaxis of a disease or condition, comprising administering to a 
patient in need of such treatment an effective amount of an immunopotentiating agent 
selected from the group consisting of a synthetic polypeptide as broadly described above, a 
synthetic polynucleotide as broadly described above and a synthetic construct as broadly 
described above, or a composition as broadly described above. 

15 The invention also encompasses the use of the synthetic polypeptide, the synthetic 

polynucleotide and the synthetic construct as broadly described above in the study, and 
modulation of immune responses. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a diagrammatic representation showing the number of people living 
with AIDS in 1998 in various parts of the world and most prevalent HIV clades in these 
regions. Estimates generated by UNAIDS. 

5 Figure 2 is a graphical representation showing trends in the incidence of the 

common HTV clades and estimates for the future. Graph from the International Aids 
Vaccine Initiative (IAVI). 

Figure 3 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV gag [SEQ ID NO: 1] used for the construction of an 
10 embodiment of an HTV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV gag protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

15 Figure 4 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV pol [SEQ ID NO: 2] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV pol protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Koiber, John Moore, Cristian Brand er, Richard Koup, Barton 

20 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR98-485. 

Figure 5 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HTV vif [SEQ ID NO: 3] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
25 consensus sequences for the HTV vif protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR98-485. 
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Figure 6 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV vpr [SEQ ID NO: 4] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV vpr protein from the HTV Molecular Immunology 
5 Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 7 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV tat [SEQ ID NO: 5] used for the construction of an 
10 embodiment of an HIV Savine. Also shown are the alignments of common HTV clade 
consensus sequences for the HTV tat protein from the HTV Molecular Immunology 
Database 1997, Editors Bette Koiber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

15 Figure 8 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HTV rev [SEQ ID NO: 6] used for the construction of an 
embodiment of an HTV Savine. Also shown are the alignments of common HTV clade 
consensus sequences for the HIV rev protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Koiber, John Moore, Cristian Brander, Richard Koup, Barton 

20 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 9 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV vpu [SEQ ID NO: 7] used for the construction of an 
embodiment of an HTV Savine. Also shown are the alignments of common HIV clade 
25 consensus sequences for the HTV vpu protein from the HTV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 10 is a diagrammatic representation showing overlapping segments of a 
30 parent polypeptide sequence for HTV env [SEQ ID NO: 8] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
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consensus sequences for the HIV env protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

5 Figure 1 1 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV nef [SEQ ID NO: 9] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV nef protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
10 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 12 is a diagrammatic representation depicting the systematic segmentation 
of the designed degenerate consensus sequences for each HIV protein and the reverse 
translation of each segment into a DNA sequence. Also shown is the number of segments 

15 used during random rearrangement and amino acids that were removed. Amino acids 
surrounded by an open square were removed from the design, because degenerate codons 
to cater for the desired amino acid combination required too many degenerate bases to 
comply with the incorporation of degenerate sequence rules outlined in the description of 
the invention herein. Amino acids surrounded by an open circle were removed only in the 

20 segment concerned mainly because they were coded for in an oligonucleotide overlap 
region. Amino acids marked with an asterisk were designed differently in one fragment 
compared to die corresponding overlap region (see tat gene) 

Figure 13 is a diagrammatic representation showing the first and second most 
frequently used codons in mammals used to reverse translate HTV protein segments. Also 
25 shown are all first and second most frequently used degenerate codons for two amino acids 
where only one base is varied. Codons used where more than one base was varied were 
worked out in each case by comparing all the codons for each amino acid. The IUPAC 
codes for degenerate bases are also shown. 

Figure 14 illustrates the construction plan for the HIV Savine showing the 
30 approximate sizes of the subcassettes, cassettes and full-length Savine cDNA and the 
restriction sites involved in joining them together. Also shown are the extra sequences 
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added onto each subcassette during their design and a brief description of how the 
subcassettes, cassettes and full length cDNA were constructed and transferred into 
appropriate DNA plasmids. Description of full length construction: pA was cleaved with 
XhoVSali and cloned into Xhol aims of the B cassette; pAB was cleaved with Xhol and 
5 cloned into Xhol arms of the C cassette; full length construct is excisable with either 
XbaVBamW at the 5' end or Bgtil at the 3* end. Options for excising cassettes: A) 
XbaVBamHl at the 5* end, BglWXhol at the 3' end; B) XbaVBamHl at the 5' end, 
BglWSall at the 3* end; C) Xbal/BamHI at the 5' end, BglU/Sall at the 3* end. Cleaving 
plasmid vectors: pDNAVacc is cleavable with XbaVXhol (DNA vaccination); pBCB07 or 
10 pTK7.5 vectors are cleavable with BamHUSall (Recombinant Vaccinia); pAvipox vector 
pAF09 is cleavable with BamHUSall (Recombinant Avipox). 

Figure 15 shows the full length DNA (17253 bp) and protein sequence (5742 aas) 
of the HIV Savine construct. Fragment boundaries are shown, together with the position of 
each fragment in each designed HTV protein, fragment number (in brackets), spacer 

15 residues (two alanine residues) and which fragment the spacer was for (open boxes and 
arrows). The location of residual restriction site joining sequences corresponding to 
subcassette or cassette boundaries (shaded boxes) are also shown, along with start and stop 
codons, Kozak sequence, the location of the murine influenza virus CTL epitope sequence 
(near the 3' end), important restriction sites at each end and the position of each degenerate 

20 amino acid (indicated by l X*). 

Figure 16 depicts the layout and position of oligonucleotides in the designed DNA 
sequence for subcassette Al. The sequences which anneal to the short amplification 
oligonucleotides are indicated by hatched boxes and the position of oligonucleotide 
overlap regions are dark shaded 

25 Figure 17: Panel (a) depicts the stepwise asymmetric PCR of the two halves of 

subcassette Al (lanes 2-5 and 7-9, respectively) and final splicing together by SOEing 
(lane 10). DNA standards in lane 1 are pUC18 digested with Sau3Al. Panel (b) shows the 
stepwise ligation-mediated joining and PCR amplification of each cassette as indicated. 
DNA standards in lane 1 are SPP1 cut with EcoRI. 

30 Figure 18: Panel (a) shows summary of the construction of the DNA vaccine 

plasmids that express one HIV Savine cassette. Panel (b) shows a summary of the 
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construction of the plasmids used for marker rescue recombination to generate Vaccinia 
viruses expressing one HIV Savine cassette. Panel (c) shows a summary of the 
construction of the DNA vaccine plasmids which each express a version of the full-length 
HIV Savine cDNA 

5 Figure 19 shows restimulation of HIV specific polyclonal CTL responses from 

three HIV-infected patients by the HIV Savine constructs. PBMCs from three different 
patients were restimulated for 7 days by infection with Vaccinia virus pools expressing the 
HIV Savine cassettes: Pool 1 included W-ACl and W-BC1; Pool 2 included W-AC2, 
W-BC2 andW-CC2. The restimulated PBMCs were then mixed with autologous LCLs 
10 (effector to target ratio of 50:1), which were either uninfected or infected with either 
Vaccinia viruses expressing the HIV proteins gag (W-gag), env (W-env) or pol (W- 
pol), W- HIV Savine pools 1 (light bars) or 2 (dark bars) or a control Vaccinia virus (VV- 
Lac) and the amount of 51 Cr released used to determine percent specific lysis. K562 cells 
were used to determine the level of NK cell-mediated killing in their stimulated culture. 

15 Figure 20 is a diagrammatic representation showing CD4+ proliferation of 

PBMCs from HIV-1 infected patients restimulated with either Pooll or Pool2 of the HIV-1 
Savine. Briefly PBMCs were stained with CFSE and culture for 6 days with or without 
Ws encoding either pooll or pool2 of the HIV-1 Savine. Restimulated Cells were then 
labelled with antibodies and analysed by FACS. 

20 Figure 21 is a graphical representation showing the CTL response in mice 

vaccinated with the HIV Savine. C57BL6 mice were immunised with the HIV-1 Savine 
DNA vaccine comprising the six plasmids described in Figure 18a (100 fig total DNA was 
given as 50 jig/leg i.m.). One week later Poxviruses (IxlO 7 pfu) comprising Pool 1 of the 
HIV-1 Savine were used to boost the immune responses. Three weeks later splenocytes 

25 from these mice were restimulated with W-Pool 1 or W-Pool 2 for 5 days and the 
resultant effectors used in a 5! Cr release cytotoxicity assay against targets infected with 
CTRVV, W-pools or W expressing the natural antigens from HTV-L 

Figure 22 shows immune responses of HIV Immune Macaques (vaccinated with 
recombinant FPV expressing gag-pol and challenged with HIV-1 2 years prior to 
30 experiment). Monkeys 1 and 2 were immunised once at day 0 with W Savine pool 1 
(Three Ws which together express the entire HIV Savine ). Monkey 3 was immunised 
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twice with FPV-gag-pol Le. y Day 0 is 3 weeks after first FPV-gag-pol immunisation. A) 
IFN-y detection by ELISPOT of whole blood (0.5 mL, venous blood heparin- 
anticoagulated) stimulated with Aldrithiol-2 inactivated whole HTV-1 (20 hours, 20 
/ig/mL). Plasma samples were then centrifuged (lOOOxg) and assayed in duplicate for 
5 antigen-specific IFN using capture ELISA. B) Flow cytometric detection of HIV- 1 specific 
CD69+/CD8+ T cells. Freshly isolated PBMCs were stimulated with inactivated HIV-1 as 
above for 16 hours, washed and labelled with the antibodies. Cells were then analysed 
using a FACScalibur™ flow cytometer and data, analysed using Cell-Quest software. C) 
Flow cytometric detection of HIV-1 specific CD69+/CD4+ T cells carried out as in B). 

10 Figure 23 shows a diagram of a system used to carry out the instructions encoded 

by the storage medium of Figures 28 and 29. 

Figure 24 depicts a flow diagram showing an embodiment of a method for 
designing synthetic polynucleotide and synthetic polypeptides of the invention. 

Figure 25 shows an algorithm, which inter alia utilises the steps of the method 
1 5 shown in Figure 24. 

Figure 26 shows an example of applying the algorithm of Figure 25 to an input 
consensus polyprotein sequence of Hepatitis C la to execute the segmentation of the 
polyprotein sequence, die rearrangement of the segments, the linkage of the rearranged 
segments and the outputting of synthetic polynucleotide and polypeptide sequences for the 
20 preparation of Savines for treating and/or preventing Hepatitis C infection. 

Figure 27 illustrates an example of applying the algorithm of Figure 25 to input 
consensus melanocyte differentiation antigens (gplOO, MART, TRP-1, Tyros, Trp-2, 
MC1R, MUC1F and MUC1R) and to consensus melanoma specific antigens (BAGE, 
GAGE-1, gpl00In4, MAGE-1, MAGE-3, PRAME, TRP2IN2, NYNSOla, NYNSOlb and 
25 LAGE1) to facilitate segmentation of those sequences, to rearrange the segments, to link 
the rearranged segments and to synthetic polynucleotide and polypeptide sequences for the 
preparation of Savines for treating and/or preventing melanoma. 

Figure 28 shows a cross section of a magnetic storage medium. 

Figure 29 shows a cross section of an optically readable data storage medium. 
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Figure 30 shows six HIV Savine cassette sequences (Al [SEQ ID NO: 393], A2 
[SEQ ID NO: 399], B1[SEQ ID NO: 395], B2 [SEQ ID NO: 401], CI [SEQ ID NO: 397] 
and C2 [SEQ ID NO: 403]). Al, Bl and CI can be joined together using, for example, 
convenient restriction enzyme sites provided at the ends of each cassette to construct an 
5 embodiment of a full length HIV Savine [SEQ ID NO: 405]. A2, B2 and C2 can also be 
joined together to provide another embodiment of a full length HIV Savine with 350 aa 
mutations common in major HIV clades. The cassettes A/B/C can be joined into single 
constructs using specific restriction enzyme sites incorporated after the start codon or 
before the stop codon in the cassettes 
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BRIEF DESCRIPTION OF THE SEQUENCES: SUMMARY TABLE 
TABLE A 





•I 


It 

1 ilJiMGlrJMi 
i 


SEO ID NO' 1 


frAfr con^en^im nnlvn^ntirlp 


4QQ a a 


cpO TD NO- 2 


rvjjL cuiibciiaui> poiypcpiiue 


yyD aa 


cpn TD NO- 1 

Oliy JUL/ liv/. J 


A/TP* rnncAnciic ns\1\/rip¥itir1«i 

v i_f uuiioci JoUv> puiypcpuuc 




QFO TD NO- 4 


\/T*TJ f*/\n cf*n cii c tvilvFvntiHp 
V i A bUliovilaUo jsUiyfJvLslillC 


j^o aa 


cpn TD NO- 5 


TAT nonQPtiQUQ nrtlvrvntiHi* 


109 aa 
iuz aa 


SEO ID NO- 6 


11 T* V pnTKJPTiQiiQ nrilvnpntirlf* 


191 aa 
izj aa 


SEO ID NO- 7 


VPT T prynQPn^iiQ nrtlvnpritiHf* 


R1 aa 

Ol da 


SEO ID NO- 8 


PNV fxin«kpncii<5 riAlviv*TitiHp 


fi^l aa 
uji da 


SEO ID NO' 9 


NEE consensus nol vn en tide 


206 aa 


SEO ID NO* 10 


GAG segment 1 


90 nts 


SEQIDNO: 11 


Polypeptide encoded by SEQ ID NO: 10 


30 aa 


SEQIDNO: 12 


GAG segment 2 


90 nts 


SEQIDNO: 13 


Polypeptide encoded by SEQ ID NO: 12 


30 aa 


SEQIDNO: 14 


GAG segment 3 


90 nts 


SEQIDNO: 15 


Polypeptide encoded by SEQ ID NO: 14 


30 aa 


SEQ ID NO: 16 


GAG segment 4 


90 nts 


SEQIDNO: 17 


Polypeptide encoded by SEQ ID NO: 16 


30 aa 


SEQIDNO: 18 


GAG segment 5 


90 nts 


SEQIDNO: 19 


Polypeptide encoded by SEQ ID NO: 18 


30 aa 


SEQ ID NO: 20 


GAG segment 6 


90 nts 


SEQIDNO 21 


Polypeptide encoded by SEQ ID NO: 20 


30 aa 


SEQ ID NO: 22 


GAG segment 7 


90 nts 
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1 sam^mm 
1 smmm ■ 


.; 
[ 


t - ! 

i; gjsimwi i 
! 1 


SEQ ID NO: 23 


Polypeptide encoded by SEQ ID NO: 22 


30 aa 


SEQ ID NO: 24 


GAG segment 8 


90nts 


SEQ ID NO: 25 


Polypeptide encoded by SEQ ID NO: 24 


30 aa 


SEQ ID NO: 26 


GAG segment 9 


90nts 


SEQ ID NO: 27 


Polypeptide encoded by SEQ ID NO: 26 


30 aa 


SEQ ID NO: 28 


GAG segment 10 


90 nts 


SEQ ID NO: 29 


Polypeptide encoded by SEQ ID NO: 28 


30 aa 


SEQ ID NO: 30 


GAG segment 1 1 


90 nts 


SEQ ID NO: 31 


Polypeptide encoded by SEQ ID NO: 30 


30 aa 


SEQ ID NO: 32 


GAG segment 12 


90 nts 


SEQ ID NO: 33 


Polypeptide encoded by SEQ ID NO: 32 


30 aa 


SEQ ID NO: 34 


GAG segment 13 


90 nts 


SEQ ID NO: 35 


Polypeptide encoded by SEQ ID NO: 34 


30 aa 


SEQIDNO: 36 


GAG segment 14 


90 nts 


SEQ ID NO: 37 


Polypeptide encoded by SEQ ID NO: 36 


30 aa 


SEQ ID NO: 38 


GAG segment 15 


90 nts 


SEQ ID NO: 39 


Polypeptide encoded by SEQ ID NO: 38 


30 aa 


SEQ ID NO: 40 


GAG segment 16 


90 nts 




rolypeptide encoded by bb\£ ID NO: 40 


30 aa 


SEQ ID NO: 42 


GAG segment 17 


90 nts 


SEQ ID NO: 43 


Polypeptide encoded by SEQ ID NO: 42 


30 aa j 


SEQ ID NO: 44 


GAG segment 18 


90 nts 


SEQIDNO: 45 


Polypeptide encoded by SEQ ID NO: 44 


30 aa 


1 SEQ ID NO: 46 


GAG segment 19 


90 nts 
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1 $!SQWEMClEiB 

1 mumsM 


j 

1 




SEQ ID NO: 47 


Polypeptide encoded by SEQ ID NO: 46 


30 aa 


SEQIDNO: 48 


GAG segment 20 


90 nts 


SEQ ID NO: 49 


Polypeptide encoded by SEQ ID NO: 48 


30 aa 


SEQIDNO: 50 


GAG segment 21 


90 nts 


SEQIDNO: 51 


Polypeptide encoded by SEQ ID NO: 50 


30 aa 


SEQIDNO: 52 


GAG segment 22 


90 nts 


SEQIDNO: 53 


Polypeptide encoded by SEQ ID NO: 52 


30 aa 


SEQIDNO: 54 


GAG segment 23 


90 nts 


SEQIDNO: 55 


Polypeptide encoded by SEQ ID NO: 54 


30 aa 


SEQIDNO: 56 


GAG segment 24 


90 nts 


SEQIDNO: 57 


Polypeptide encoded by SEQ ID NO: 56 


30 aa 


SEQIDNO: 58 


GAG segment 25 


90 nts 


SEQ ID NO: 59 


Polypeptide encoded by SEQ ID NO: 58 


30 aa 


SEQ ID NO: 60 


GAG segment 26 


90 nts 


SEQ ID NO: 61 


Polypeptide encoded by SEQ ID NO: 60 


30 aa 


SEQ ID NO: 62 


GAG segment 27 


90 nts 


SEQ ID NO: 63 


Polypeptide encoded by SEQ ID NO: 62 


30 aa 


SEQ ID NO: 64 j 


GAG segment 28 


90 nts 


OCA TT"\ XT/"\. £C 

S-bQ JUL) NU: od 


Polypeptide encoded by SEQ ID NO: 64 


30 aa 


SEQ ID NO: 66 


GAG segment 29 


90 nts 


SEQIDNO: 67 


Polypeptide encoded by SEQ ID NO: 66 


30 aa 


SEQIDNO: 68 


GAG segment 30 


90 nts 


SEQ ID NO: 69 


Polypeptide encoded by SEQ ID NO: 68 


30 aa 


SEQ ID NO: 70 


GAG segment 31 


90 nts 
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1 $&QiMt€£ii& 




i i 
! ■ 1 


SEQIDNO: 71 


Polypeptide encoded by SEQ ID NO: 70 


30 aa 


SEQ ID NO: 72 


GAG segment 32 


90nts 


SEQ ID NO: 73 


Polypeptide encoded by SEQ ID NO: 72 


30 aa 


SEQ ID NO: 74 


GAG segment 33 


57 nts 


SEQ ID NO: 75 


Polypeptide encoded by SEQ ID NO: 74 


19 aa 


SEQIDNO: 76 


POL segment 1 


90 nts 


SEQ ID NO: 77 


Polypeptide encoded by SEQ ID NO: 76 


30 aa 


SEQIDNO: 78 


POL segment 2 


90 nts 


SEQIDNO: 79 


Polypeptide encoded by SEQ ID NO: 78 


30 aa 


SEQIDNO: 80 


POL segment 3 


90 nts 


SEQ ID NO: 81 


Polypeptide encoded by SEQ ID NO: 80 


30 aa 


SEQ ID NO: 82 


POL segment 4 


90 nts 


SEQIDNO: 83 


Polypeptide encoded by SEQ ID NO: 82 


30 aa 


SEQ ID NO: 84 


POL segment 5 


90 nts 


SEQ ID NO: 85 


Polypeptide encoded by SEQ ID NO: 84 


30 aa 


SEQ ID NO: 86 


POL segment 6 


90 nts 


OTT^ TTV VTA 

SEQ ID NO: 87 


Polypeptide encoded by SEQ ID NO: 86 


30 aa 


SEQ ID NO: 88 


POL segment 7 


90 nts 




roiypepuoe encoded oy oe\i LU INU. oo 


30 aa 


SEQ ED NO 90 


POL segment 8 


90 nts 


SEQ ED NO 91 


Polypeptide encoded by SEQ ID NO: 90 


30 aa 


SEQ ED NO 92 


POL segment 9 


90 nts 


SEQIDNO: 93 


Polypeptide encoded by SEQ ID NO: 92 


30 aa 


SEQ ID NO: 94 


POL segment 10 


90 nts 
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1 WQUBMCIBm 


j; 


ii 

i i 


OTJ/\ TT*\ \T/^. fkC 

SEQ ID NO: 95 


Polypeptide encoded by SEQ ID NO: 94 


30 aa 


OrA 1 1 v -v T/"\ /\ 

SEQ ID NO: 96 


POL segment 11 


90nts 


SEQ ID NO: 97 


Polypeptide encoded by SEQ ID NO: 96 


30 aa 


SEQ ID NO: 98 


POL segment 12 


90 nts 


SEQ ID NO: 99 


Polypeptide encoded by SEQ ID NO: 98 


30 aa 


SEQ ID NO: 100 


POL segment 13 


90 nts 


SEQ ID NO: 101 


Polypeptide encoded by SEQ ID NO: 100 


30 aa 


SEQ ID NO: 102 


POL segment 14 


90 nts 


SEQ ID NO: 103 


Polypeptide encoded by SEQ ID NO: 102 


30 aa 


SEQ ID NO: 104 


POL segment 15 


90 nts 


SEQ ID NO: 105 


Polypeptide encoded by SEQ ID NO: 104 


30 aa 


SEQ ID NO: 106 


POL segment 16 


90 nts 


SEQ ID NO: 107 


Polypeptide encoded by SEQ ID NO: 106 


30 aa 


SEQ ID NO: 108 


POL segment 17 


90 nts 


SEQ ID NO: 109 


Polypeptide encoded by SEQ ID NO: 108 


30 aa 


SEQ ID NO: 110 


POL segment 18 


90 nts 


SEQ ID NO: 111 


Polypeptide encoded by SEQ ID NO: 1 1 0 


30 aa 


biiQID NO: 112 


POL segment 19 


90 nts 




jruiypepuue encouea oy or-y ulj invj. 1 iz i 


JU aa 


SEQ ID NO: 114 


POL segment 20 


90 nts 


SEQ ID NO: 115 


Polypeptide encoded by SEQ ID NO: 114 


30 aa 


SEQ ID NO: 116 


POL segment 21 


90 nts 


SEQ ID NO: 117 


Polypeptide encoded by SEQ ID NO: 116 


30 aa 


SEQ ID NO: 118 


POL segment 22 


90 nts 
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! I 

! &mwm , 
! _ 1 


bJby id nu: 119 


Polypeptide encoded by SEQ ID NO: 118 


30 aa 


SEQ ID NO: 120 


POL segment 23 


90 nts 


SEQ ID NO: 121 


Polypeptide encoded by SEQ ID NO: 120 


30 aa 


SEQ ID NO: 122 


POL segment 24 


90 nts 


SEQ ID NO: 123 


Polypeptide encoded by SEQ ID NO: 122 


30 aa 


SEQ ID NO: 124 


POL segment 25 


90 nts 


SEQ ID NO: 125 


Polypeptide encoded by SEQ ID NO: 124 


30 aa 


SEQ ID NO: 126 


POL segment 26 


90 nts 


SEQ ID NO: 127 


Polypeptide encoded by SEQ ID NO: 126 


30 aa 


SEQ ID NO: 128 


POL segment 27 


90 nts 


SEQ ID NO: 129 


Polypeptide encoded by SEQ ID NO: 128 


30 aa 


SEQ ID NO: 130 


POL segment 28 


90 nts 


SEQ ID NO: 131 


Polypeptide encoded by SEQ ID NO: 130 


30 aa 


SEQ ID NO: 132 


POL segment 29 


90 nts 


SEQ ID NO: 133 


Polypeptide encoded by SEQ ID NO: 132 


30 aa 


SEQ ID NO: 134 


POL segment 30 


90 nts 


OT?A TT""\ VT/^. IOC 

SEQ ID NO: 135 


Polypeptide encoded by SEQ ID NO: 134 


30 aa 


SEQ ID NO: 136 


POL segment 31 


90 nts 


oxiv^ 1LI INU. 13/ 


roiypepnoe encooeo ny ony id nu. i jo 


iu aa 


SEQIDNO: 138 


POL segment 32 


90 nts 


SEQ ED NO: 139 


Polypeptide encoded by SEQ ID NO: 138 


30 aa 


SEQIDNO: 140 


POL segment 33 


90 nts 


SEQIDNO: 141 


Polypeptide encoded by SEQ ID NO: 140 


30 aa 


SEQIDNO: 142 


POL segment 34 


90 nts 
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1 §lc0UMi€& !0 


ij 


1 w*om 

Ii 
•I 


bEQ ID NO: 143 


Polypeptide encoded by SEQ ID NO: 142 


30 aa 


SEQ ID NO: 144 


POL segment 35 


90nts 


0"T»/"V TT"V \TA t i f 

SEQ ID NO: 145 


Polypeptide encoded by SEQ ID NO: 144 


30 aa 


SEQ ID NO: 146 


POL segment 36 


90nts 


SEQ ID NO: 147 


Polypeptide encoded by SEQ ID NO: 146 


30 aa 


SEQ ID NO: 148 


POL segment 37 


90 nts 


SEQ ID NO: 149 


Polypeptide encoded by SEQ ID NO: 148 


30 aa 


SEQ ID NO: 150 


POL segment 38 


90 nts j 


SEQ ID NO: 151 


Polypeptide encoded by SEQ ID NO: 150 


30 aa 


SEQ ID NO: 152 


POL segment 39 


90 nts 


SEQ ID NO: 153 


Polypeptide encoded by SEQ ID NO: 152 


30 aa 


SEQ ID NO: 154 


POL segment 40 


90 nts 


SEQ ID NO: 155 


Polypeptide encoded by SEQ ID NO: 154 


30 aa 


SEQ ID NO: 156 


POL segment 41 


90 nts 


SEQ ID NO: 1 57 


Polypeptide encoded by SEQ ID NO: 156 


30 aa 


SEQ ID NO: 1 58 


POL segment 42 


90 nts 


bEQ ID NO: 159 


T>_. 1 jj J _ _ f 1 1 pTJA TTN VTA 1 c f> 

Polypeptide encoded by SEQ ID NO: 158 


30 aa 


bnQ ID NU: 160 


POL segment 43 


90 nts 




roiypepnoe encoaea Dy ojdvi id inu. iou 


JU aa 


SEQ ID NO: 162 


POL segment 44 


90 nts 


SEQ ID NO: 163 


Polypeptide encoded by SEQ ID NO: 162 


30 aa 


SEQ ID NO: 164 


POL segment 45 


90 nts 


SEQ ID NO: 165 


Polypeptide encoded by SEQ ID NO: 164 


30 aa 


SEQ ID NO 166 


POL segment 46 


90 nts 
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ii 


i 


SEQIDNO: 167 


Polypeptide encoded by SEQ ID NO: 166 


30 aa 


SEQIDNO: 168 


POL segment 47 


90 nts 


SEQIDNO: 169 


Polypeptide encoded by SEQ ID NO: 168 


30 aa 


SEQIDNO: 170 


POL segment 48 


90 nts 


SEQIDNO: 171 


Polypeptide encoded by SEQ ID NO: 170 


30 aa 


SEQ ID NO: 172 


POL segment 49 


90 nts 


SEQIDNO: 173 


Polypeptide encoded by SEQ ID NO: 172 


30 aa 


SEQIDNO: 174 


POL segment 50 


90 nts 


SEQIDNO: 175 


Polypeptide encoded by SEQ ID NO: 174 


30 aa 


SEQIDNO: 176 


POL segment 51 


90 nts 


SEQIDNO: 177 


Polypeptide encoded by SEQ ID NO: 1 76 


30 aa 


SEQ ID NO: 178 


POL segment 52 


90 nts 


SEQIDNO: 179 


Polypeptide encoded by SEQ ID NO: 178 


30 aa 


SEQIDNO: 180 


POL segment 53 


90 nts 


SEQIDNO: 181 


Polypeptide encoded by SEQ ID NO: 180 


30 aa 


SEQIDNO: 182 


POL segment 54 


90 nts 


SEQIDNO: 183 


Polypeptide encoded by SEQ ID NO: 182 


30 aa 


SEQIDNO: 184 


POL segment 55 


90 nts 


Ccn TT> XT/"V 1 OC 


Polypeptide encoded by SEQ ID NO: 1 84 


30 aa 


SEQIDNO: 186 


POL segment 56 


90 nts 


SEQIDNO: 187 


Polypeptide encoded by SEQ ID NO: 1 86 


30 aa 


SEQIDNO: 188 


POL segment 57 


90 nts 


SEQIDNO: 189 


Polypeptide encoded by SEQ ID NO: 1 88 


30 aa 


SEQIDNO: 190 


POL segment 58 


90 nts 
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1 J£0£1SC£|P 
1! IMJMBiM 


i| 

j! M&M^i€E 

i 1 


J iMWFM 
_ i 


SEQIDNO: 191 


Polypeptide encoded by SEQ ID NO: 190 


30 aa 


SEQIDNO: 192 


POL segment 59 


90nts 


SEQIDNO: 193 


Polypeptide encoded by SEQ ID NO: 192 


30 aa 


SEQ ID NO: 194 


POL segment 60 


90nts 


SEQIDNO: 195 


Polypeptide encoded by SEQ ID NO: 194 


30 aa 


SEQIDNO: 196 


POL segment 61 


90 nts 


SEQIDNO: 197 


Polypeptide encoded by SEQ ID NO: 196 


30 aa 


SEQIDNO: 198 


POL segment 62 


90 nts 


SEQIDNO: 199 


Polypeptide encoded by SEQ ID NO: 198 


30 aa 


SEQIDNO: 200 


POL segment 63 


90 nts 


SEQ ID NO: 201 


Polypeptide encoded by SEQ ID NO: 200 


30 aa 


SEQIDNO: 202 


POL segment 64 


90 nts 


SEQ ID NO: 203 


Polypeptide encoded by SEQ ID NO: 202 


30 aa 


SEQ ID NO: 204 


POL segment 65 


90 nts 


SEQ ID NO: 205 


Polypeptide encoded by SEQ ID NO: 204 


30 aa 


SEQ ID NO: 206 


POL segment 66 


60 nts 


SEQ ID NO: 207 


Polypeptide encoded by SEQ ID NO: 206 


20aa 


SEQ ID NO: 208 


VTF segment 1 


90 nts 




roiypepnde encoded by MiQ ID NO: 208 


30 aa 


SEQ ID NO: 210 


VflF segment 2 


90 nts 


SEQIDNO: 211 


Polypeptide encoded by SEQ ID NO: 210 


30 aa 


SEQ ID NO: 212 


VBF segment 3 


90 nts 


SEQ ID NO: 213 


Polypeptide encoded by SEQ ID NO: 212 


30 aa 


SEQIDNO: 214 


VIF segment 4 


90 nts 
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26 





SEQIDNO: 215 
SEQIDNO:216 
SEQIDNO: 217 
SEQIDNO: 218 
SEQIDNO: 219 
SEQIDNO: 220 
SEQIDNO: 221 
SEQIDNO: 222 
SEQIDNO: 223 
SEQIDNO: 224 
SEQIDNO: 225 
SEQIDNO: 226 
SEQ ID NO: 227 
SEQIDNO: 228 
SEQ ID NO: 229 
SEQ ED NO: 230 
SEQIDNO: 231 
SEQ ID NO: 232 
SEQIDNO: 233 
SEQ ID NO: 234 
SEQIDNO: 235 
SEQIDNO: 236 
SEQIDNO: 237 
SEQ ID NO: 238 



Polypeptide encoded by SEQ ID NO: 214 
VIF segment 5 

Polypeptide encoded by SEQ ID NO: 21 6 
VIF segment 6 

Polypeptide encoded by SEQ ID NO: 21 8 
VIF segment 7 

Polypeptide encoded by SEQ ID NO: 220 
VIF segment 8 

Polypeptide encoded by SEQ ID NO: 222 
VIF segment 9 

Polypeptide encoded by SEQ ID NO: 224 
VIF segment 10 

Polypeptide encoded by SEQ ID NO: 226 
VIF segment 1 1 

Polypeptide encoded by SEQ ID NO: 228 
VIF segment 12 

Polypeptide encoded by SEQ ID NO: 230 
VPR segment 1 

Polypeptide encoded by SEQ ED NO: 232 
VPR segment 2 

Polypeptide encoded by SEQ ED NO: 234 
VPR segment 3 

Polypeptide encoded by SEQ ED NO: 236 
VPR segment 4 



30 aa 
90nts 
30 aa 
90nts 
30 aa 
90nts 
30 aa 
90nts 
30 aa 
90nts 
30 aa 
90nts 
30 aa 
90nts 
30 aa 
81 nts 
27 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
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SEQIDNO:239 


Polypeptide encoded by SEQ ID NO: 238 


30 aa 


SEQIDNO:240 


VPR segment 5 


90nts 


SEQIDNO:241 


Polypeptide encoded by SEQ ID NO: 240 


30 aa 


SEQIDNO:242 


VPR segment 6 


63nts 


SEQE)NO:243 


Polypeptide encoded by SEQ ID NO: 242 


21 aa 


SEQIDNO:244 


TAT segment 1 


90nts 


SEQIDNO:245 


Polypeptide encoded by SEQ ID NO: 244 


30 aa 


SEQIDNO:246 


TAT segment 2 


90nts 


SEQIDNO:247 


Polypeptide encoded by SEQ ID NO: 246 


30 aa 


SEQIDNO:248 


TAT segment 3 


90nts 


SEQIDNO:249 


Polypeptide encoded by SEQ ID NO: 248 


30 aa 


SEQIDNO: 250 


TAT segment 4 


90nts 


SEQ ID NO: 251 


Polypeptide encoded by SEQ ID NO: 250 


30 aa 


SEQIDNO: 252 


TAT segment 5 


90nts 


SEQIDNO: 253 


Polypeptide encoded by SEQ ID NO: 252 


30 aa 


SEQIDNO: 254 


TAT segment 6 


81 nts 


SEQIDNO: 255 


Polypeptide encoded by SEQ ID NO: 254 


27 aa 


SEQ ID NO: 256 


REV segment 1 


90 nts 


SEQIDNO: 257 


Polypeptide encoded by SEQ ID NO: 256 


30 aa 


SEQIDNO: 258 


REV segment 2 


90 nts 


SEQIDNO: 259 


Polypeptide encoded by SEQ ID NO: 258 


30 aa 


SEQ ID NO: 260 


REV segment 3 


90 nts 


SEQ ID NO: 261 


Polypeptide encoded by SEQ ID NO: 260 


30 aa 


SEQ ID NO: 262 


REV segment 4 


90 nts 
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1 


i i 
i i 


DiiQ ID NO. Z0J 


Polypeptide encoded by ohQ ID NO: 262 


30 aa 


bbQ ID NU: 2o4 


REV segment 5 


r\/\ . 

90nts 


SEQ ID NO: 265 


Polypeptide encoded by SEQ ID NO: 264 


30 aa 


SEQ ID NO: 266 


REV segment 6 


90nts 


SEQ ID NO: 267 


Polypeptide encoded by SEQ ID NO: 266 


30 aa 


SEQ ID NO: 268 


REV segment 7 


90nts 


SEQ ID NO: 269 


Polypeptide encoded by SEQ ID NO: 268 


30 aa 


nr»A tt\ via r\ 

SEQ ID NO: 270 


REV segment 8 


54nts 


SEQ ID NO: 271 


Polypeptide encoded by SEQ ID NO: 270 


18 aa 


SEQ ID NO: 272 


VPU segment 1 


90 nts 


"B"* "WTTV "VTA 

SEQ ID NO: 273 


Polypeptide encoded by SEQ ID NO: 272 


30 aa 


SEQ ID NO: 274 


VPU segment 2 


90 nts 


SEQ ID NO: 275 


Polypeptide encoded by SEQ ID NO: 274 


30 aa 


PT7A TT"\ "KT/~\- O*"?^ 

SEQ ID NO: 276 


VPU segment 3 


90 nts 


kbQ ID NO: 277 


Polypeptide encoded by SEQ ID NO: 276 


30 aa 


CCA rr\ XT/'V 070 

dliQ ID NU: Z /o 


VPU segment 4 


90 nts 


pT7A TT> XT/"*!. 0*7fl 

5-bQ ID NU: 1 /y 


Polypeptide encoded by SEQ ID NO: 278 


30 aa 


orSvi liJ INLf. ZoU 


VP u segment d 


63 nts 


SEOIDNO- 281 


x t/ljrJJCpuuC vllvUudl Ujr OjCtV^ JUL-* 1NV-/. ZOu 


91 oo 

zi aa 


SEQ ID NO: 282 


ENV segment 1 


90 nts 


SEQ JD NO: 283 


Polypeptide encoded by SEQ ID NO: 282 


30 aa 


SEQ JD NO: 284 


ENV segment 2 


90 nts 


SEQ ID NO: 285 


Polypeptide encoded by SEQ ID NO: 284 


30 aa 


SEQ JD NO: 286 


ENV segment 3 


90 nts 
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SEQ ID NO: 287 


Polypeptide encoded by SEQ ID NO: 286 


30 aa 


SEQ ID NO: 288 


ENV segment 4 


90nts 


SEQ ID NO: 289 


Polypeptide encoded by SEQ ID NO: 288 


30 aa 


SEQ ID NO: 290 


ENV segment 5 


90 nts 


SEQ ID NO: 291 


Polypeptide encoded by SEQ ID NO: 290 


30 aa 


SEQ ID NO: 292 


ENV segment 6 


90 nts 


SEQ ID NO: 293 


Polypeptide encoded by SEQ ID NO: 292 


30 aa 


SEQ ID NO: 294 


ENV segment 7 


90 nts 


SEQ ID NO: 295 


Polypeptide encoded by SEQ ID NO: 294 


30 aa 


SEQ ID NO: 296 


ENV segment 8 


90 nts 


SEQ ID NO: 297 


Polypeptide encoded by SEQ ID NO: 296 


30 aa 


SEQ ID NO: 298 


ENV segment 9 


57 nts 


SEQ ID NO: 299 


Polypeptide encoded by SEQ ID NO: 298 


19 aa 


SEQ ID NO: 300 


GAP A segment 1 


90 nts 


SEQ ID NO: 301 


Polypeptide encoded by SEQ ID NO: 300 


30 aa 


SEQ ID NO: 302 


GAP A segment 2 


90 nts 


SEQ ID NO: 303 


Polypeptide encoded by SEQ ID NO: 302 


30 aa 


SEQ ID NO: 304 


GAP A segment 3 


90 nts 




roiypepude encoded by MiQ ID NU: 3U4 


30 aa 


SEQ ID NO: 306 


GAP A segment 4 


90 nts 


SEQ ID NO: 307 


Polypeptide encoded by SEQ ID NO: 306 


30 aa 


SEQ ID NO: 308 


GAP A segment 5 


90 nts 


SEQ ID NO: 309 


Polypeptide encoded by SEQ ID NO: 308 


30 aa 


SEQ ID NO: 310 


GAP A segment 6 


90 nts 
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SEQIDN0:311 


Polypeptide encoded by SEQ ID NO: 310 


30 aa 


SEQIDNO: 312 


GAP A segment 7 


75 nts 


SEQ ID NO: 313 


Polypeptide encoded by SEQ ID NO: 312 


25nts 


SEQIDNO: 314 


GAP B segment 1 


90 nts 


SEQ ID NO: 315 


Polypeptide encoded by SEQ ID NO: 314 


30 aa 


SEQ ID NO: 316 


GAP B segment 2 


90 nts 


SEQ ID NO: 317 


Polypeptide encoded by SEQ ID NO: 316 


30 aa 


SEQIDNO: 318 


GAP B segment 3 


90 nts 


SEQ ID NO: 319 


Polypeptide encoded by SEQ ID NO: 318 


30 aa 


SEQ ID NO: 320 


GAP B segment 4 


90 nts 


SEQ ID NO: 321 


Polypeptide encoded by SEQ ID NO: 320 


30 aa 


SEQIDNO: 322 


GAP B segment 5 


90 nts 


SEQIDNO: 323 


Polypeptide encoded by SEQ ID NO: 322 


30 aa 


SEQ ID NO: 324 


GAP B segment 6 


90 nts 


SEQIDNO: 325 


Polypeptide encoded by SEQ ID NO: 324 


30 aa 


SEQIDNO: 326 


GAP B segment 7 


90 nts j 


SEQIDNO: 327 


Polypeptide encoded by SEQ ID NO: 326 


30 aa 


SEQ ID NO: 328 


GAP B segment 8 


90 nts 


SEQ ID NO: 329 


Polypeptide encoded by SEQ ID NO: 328 


30 aa 


SEQ ID NO: 330 


GAP B segment 9 


90 nts 


SEQIDNO: 331 


Polypeptide encoded by SEQ ID NO: 330 


30 aa 


SEQ ID NO: 332 


GAP B segment 10 


90 nts 


SEQIDNO: 333 


Polypeptide encoded by SEQ ID NO: 332 


30 aa 


SEQ ID NO: 334 


GAP B segment 1 1 


90 nts 
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SEQ ID NO: 335 


Polypeptide encoded by SEQ ID NO: 334 


30 aa 


SEQ ID NO: 336 


GAP B segment 12 


90nts 


SEQ ID NO: 337 


Polypeptide encoded by SEQ ID NO: 336 


30 aa 


SEQ ID NO: 338 


GAP B segment 13 


90nts 


SEQ ID NO: 339 


Polypeptide encoded by SEQ ID NO: 338 


30 aa 


SEQ ID NO: 340 


GAP B segment 14 


90nts 


SEQ ID NO: 341 


Polypeptide encoded by SEQ ID NO: 340 


30 aa 


SEQ ID NO: 342 


GAP B segment 15 


90nts 


SEQ ID NO: 343 


Polypeptide encoded by SEQ ID NO: 342 


30 aa 


SEQ ID NO: 344 


GAP B segment 16 


90nts 


SEQ ID NO: 345 


Polypeptide encoded by SEQ ID NO: 344 


30 aa 


SEQ ID NO: 346 


GAP B segment 17 


90nts 


SEQ ID NO: 347 


Polypeptide encoded by SEQ ID NO: 346 


30 aa 


SEQ ED NO: 348 


GAP B segment 18 


90nts 


SEQ ID NO: 349 


Polypeptide encoded by SEQ ID NO: 348 


30 aa 


SEQ ID NO: 350 


GAP B segment 19 


90nts 


SEQ ID NO: 351 


Polypeptide encoded by SEQ ID NO: 350 


30 aa 


SEQ ID NO: 352 


GAP B segment 20 


90nts 


ID WU. j5j 


roiypepude encoded by bh\} ID NO: 352 


30 aa 


SEQ ID NO: 354 


GAP B segment 21 


90nts 


SEQ ID NO: 355 


Polypeptide encoded by SEQ ID NO: 354 


30 aa 


SEQ ID NO: 356 


GAP B segment 22 


90nts 


SEQ ID NO: 357 


Polypeptide encoded by SEQ ID NO: 356 


30 aa 


SEQ ID NO: 358 


GAP B segment 23 


90nts 
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SEQ ID NO: 359 


Polypeptide encoded by SEQ ID NO: 358 


30 aa 


SEQ ID NO: 360 


GAP B segment 24 


90nts 


SEQ ID NO: 361 


Polypeptide encoded by SEQ ID NO: 360 


30 aa 


SEQ ID NO: 362 


GAP B segment 25 


90 nts 


SEQ ID NO: 363 


Polypeptide encoded by SEQ ID NO: 362 


30 aa 


SEQ ID NO: 364 


GAP B segment 26 


66 nts 


SEQ ID NO: 365 


Polypeptide encoded by SEQ ID NO: 364 


22 aa 


SEQ ID NO: 366 


NEF segment 1 


90 nts 


SEQ ID NO: 367 


Polypeptide encoded by SEQ ID NO: 366 


30 aa 


SEQ ID NO: 368 


NEF segment 2 


90 nts 


SEQ ID NO: 369 


Polypeptide encoded by SEQ ID NO: 368 


30 aa 


SEQ ID NO: 370 


NEF segment 3 


90 nts 


SEQ ID NO: 371 


Polypeptide encoded by SEQ ID NO: 370 


30 aa 


SEQ ID NO: 372 


NEF segment 4 


90 nts 


SEQ ID NO: 373 


Polypeptide encoded by SEQ ID NO: 372 


30 aa 


SEQ ID NO: 374 


NEF segment 5 


90 nts 


nr"A TT~V VTA 

SEQ ID NO: 375 


Polypeptide encoded by SEQ ID NO: 374 


30 aa 


SEQ ID NO: 376 


NEF segment 6 


90 nts 




I>/"kt«n-io*vHsJ<* omaa/IaJ 1_ , , OT7/"A TT\ XT/A. "3 T/T 

roiypepude encoded by i>IiQ ID NU: 376 


30 aa 


SEQ ID NO: 378 


NEF segment 7 


90 nts 


SEQ ID NO: 379 


Polypeptide encoded by SEQ ID NO: 378 


30 aa 


SEQ ID NO: 380 


NEF segment 8 


90 nts 


SEQIDNO: 381 


Polypeptide encoded by SEQ ID NO: 380 


30 aa 


SEQ ID NO: 382 


NEF segment 9 


90 nts 
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Bfl 


CEA TT\ XT/"*. lOO 

kbQ ID NU: io3 


Polypeptide encoded by SEQ ID NO: 382 


30 aa 


SEQ ID NO: 384 


NEF segment 10 


90nts 


SEQ ID NO: 385 


Polypeptide encoded by SEQ ID NO: 384 


30 aa 


SEQ ID NO: 386 


NEF segment 11 


90nts 


SEQ ID NO: 387 


Polypeptide encoded by SEQ ID NO: 386 


30 aa 


SEQ ID NO: 388 


NEF segment 12 


90nts 


SEQ ID NO: 389 


Polypeptide encoded by SEQ ID NO: 388 


30 aa 


SEQ ID NO: 390 


NEF segment 13 


78nts 


SEQ ID NO: 391 


Polypeptide encoded by SEQ ID NO: 390 


26 aa 


SEQ ID NO: 392 


HIV Cassette Al 


5703 nts 


SEQ ID NO: 393 


Polypeptide encoded by SEQ ID NO:392 


1896 aa 


SEQ ID NO: 394 


HIV Cassette Bl 


5685 nts 


SEQ ID NO: 395 


Polypeptide encoded by SEQ ID NO: 394 


1890 aa 


SEQ ID NO: 396 


"W Ifl T A A A*—* 

HIV Cassette CI 


5925 nts 


SEQ ID NO: 397 


Polypeptide encoded by SEQ ID NO: 396 


1967 aa 


bbQ ID NO: 398 


Till ¥ f**-. i * A ^ 

HIV Cassette A2 


5703 nts 


SEQ ID NO: 399 


Polypeptide encoded by SEQ ID NO: 398 


1896 aa 


bh,Kl ID NU: 400 


HIV Cassette B2 


5685 nts 




Pol \mf»ntir1*> mrnHp^ Ku CEO TPl XJfV /1HA 

jt oiypepuuc cncoucu oy oci^ un invJ. *hju 


1890 aa 


SEQ ID NO: 402 


fflV Cassette C2 


5925 nts 


SEQ ID NO: 403 


Polypeptide encoded by SEQ ID NO: 402 


1967 aa 


SEQ ID NO: 404 


HIV complete Savine 


17244 nts 


SEQ ID NO: 405 


Polypeptide encoded by SEQ ID NO: 404 


5747 aa 


SEQ ID NO: 406 


HepCla consensus polyprotein sequence 


3011 aa 
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bfcQ ID NU: 407 


HepCla segment 1 


90 nts 


SEQ ID NO: 408 


Polypeptide encoded by SEQ ID NO: 407 


30 aa 


SEQ ID NO: 409 


HepCla segment 2 


90 nts 


SEQ ID NO: 410 


Polypeptide encoded by SEQ ID NO: 409 


30 aa 


SEQ ID NO: 411 


HepCla segment 3 


90 nts 


SEQ ID NO: 412 


Polypeptide encoded by SEQ ID NO: 41 1 


30 aa 


SEQ ID NO: 413 


HepCla segment 4 


90 nts 


SEQ ID NO: 414 


Polypeptide encoded by SEQ ID NO: 413 


30 aa 


SEQ ID NO: 415 


HepCla segment 5 


90 nts 


SEQ ID NO: 416 


Polypeptide encoded by SEQ ID NO: 415 


30 aa 


SEQ ID NO: 417 


HepCla segment 6 


90 nts 


SEQ ID NO: 418 


Polypeptide encoded by SEQ ID NO: 417 


30 aa 


SEQ ID NO: 419 


HepCla segment 7 


90 nts 


SEQ ID NO: 420 


Polypeptide encoded by SEQ ID NO: 41 9 


30 aa 


C*T?f \ TT\ VTA. >•'*"» 1 

SEQ ID NO: 421 


HepCla segment 8 


90 nts 


CCA TT\ XT/A. jI**'* 

SEQ ID NO: 422 


Polypeptide encoded by SEQ ID NO: 421 


30 aa 


SEQ ID NO: 423 


HepCl a segment 9 


90 nts 


OTJA TT\ XT/A. A1A 

ISfcQ ID NU: 424 


Polypeptide encoded by SEQ ID NO: 423 


30 aa 


SEO ID NO- 425 


HW*fl » cpommt 1 ft 


yu nts 


SEQ ID NO: 426 


Polypeptide encoded by SEQ ID NO: 425 


30 aa 


SEQ ID NO: 427 


HepCla segment 11 


90 nts 


SEQ ID NO: 428 


Polypeptide encoded by SEQ ID NO: 427 


30 aa 


SEQ ID NO: 429 


HepCla segment 12 


90 nts 


SEQ ID NO: 430 


Polypeptide encoded by SEQ ID NO: 429 


30 aa 



wo 
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SEQ ID NO: 431 


HepCl a segment 13 


90nts 


SEQ ID NO: 432 


Polypeptide encoded by SEQ ID NO: 431 


30 aa 


SEQ ID NO: 433 


HepC la segment 14 


90 nts 


SEQ ID NO: 434 


Polypeptide encoded by SEQ ID NO: 433 


30 aa 


SEQ ID NO: 435 


HepCl a segment 15 


90 nts 


SEQ ID NO: 436 


Polypeptide encoded by SEQ ID NO: 435 


30 aa 


SEQ ID NO: 437 


HepC la segment 16 


90 nts 


SEQ ID NO: 438 


Polypeptide encoded by SEQ ID NO: 437 


30 aa 


SEQ ID NO: 439 


HepCla segment 17 


90 nts 


SEQ ID NO: 440 


Polypeptide encoded by SEQ ID NO: 439 


30 aa 


SEQ ID NO: 441 


HepCla segment 18 


90 nts 


SEQ ID NO: 442 


Polypeptide encoded by SEQ ID NO: 441 


30 aa 


SEQ ID NO: 443 


HepCla segment 19 


90 nts 


SEQ ID NO: 444 


Polypeptide encoded by SEQ ID NO: 443 


30 aa 


SEQ ID NO: 445 


HepCla segment 20 


90 nts 


SEQ ID NO: 446 


Polypeptide encoded by SEQ ID NO: 445 


30 aa 


SEQ ID NO: 447 


HepCla segment 21 


90 nts 


SEQ ID NO: 448 


Polypeptide encoded by SEQ ID NO: 447 


30 aa 


SEQ ID NO: 449 


HepCla segment 22 


90 nts 


SEQ ID NO: 450 


Polypeptide encoded by SEQ ID NO: 449 


30 aa 


SEQ ID NO: 451 


HepCla segment 23 


90 nts 


SEQ ID NO: 452 


Polypeptide encoded by SEQ ID NO: 451 


30 aa 


SEQ ID NO: 453 


HepCla segment 24 


90 nts 


SEQ ID NO: 454 


Polypeptide encoded by SEQ ID NO: 453 


30 aa 
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oca tt\ xt/*v acc 
ohKl ID NU: 455 


HepCla segment 25 


90nts 


SEQ ID NO: 456 


Polypeptide encoded by SEQ ID NO: 455 


30 aa 


SEQ ID NO: 457 


HepCla segment 26 


90 nts 


SEQ ID NO: 458 


Polypeptide encoded by SEQ ID NO: 457 


30 aa 


SEQ ID NO: 459 


HepCla segment 27 


90 nts 


SEQ ID NO: 460 


Polypeptide encoded by SEQ ID NO: 459 


30 aa 


SEQ ID NO: 461 


HepCla segment 28 


90 nts 


SEQ ID NO: 462 


Polypeptide encoded by SEQ ID NO: 46 1 


30 aa 


SEQ ID NO: 463 


HepCla segment 29 


90 nts 


SEQ ID NO: 464 


Polypeptide encoded by SEQ ID NO: 463 


30 aa 


SEQ ID NO: 465 


HepCla segment 30 


90 nts 


SEQ ID NO: 466 


Polypeptide encoded by SEQ ID NO: 465 


30 aa 


SEQ ID NO; 467 


HepCla segment 3 1 


90 nts 


SEQ ID NO: 468 


Polypeptide encoded by SEQ ID NO: 467 


30 aa 


SEQ ID NO: 469 


HepCla segment 32 


90 nts 


SEQ ID NO: 470 


Polypeptide encoded by SEQ ID NO: 469 


30 aa 


JSEQ ID NO: 471 


HepCla segment 33 


90 nts 


aliVj ID NU: 472 


Polypeptide encoded by SEQ ID NO: 471 


30 aa 


SEO ID NO- 473 




yu nts 


SEQ ID NO: 474 


Polypeptide encoded by SEQ ID NO: 473 


30 aa 


SEQ ID NO: 475 


HepCla segment 35 


90 nts 


SEQ ID NO: 476 


Polypeptide encoded by SEQ ID NO: 475 


30 aa 


SEQ ID NO: 477 


HepCla segment 36 


90 nts 


SEQ ID NO: 478 


Polypeptide encoded by SEQ ID NO: 477 


30 aa 
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CCA tt\ XTfV A1Q 


HepCla segment 37 


90 nts 


obQ ID NU: 4oU 


TV.,, 1. , „ ,. j* J , _ J J 1 Ay^A TT\ VT/\ JMn 

Polypeptide encoded by SEQ ID NO: 479 


30 aa 


bEQ ID NO: 481 


HepC la segment 38 


90 nts 


nTJA TTV VT/~\. iOO 

SEQ ID NO: 482 


Polypeptide encoded by SEQ ID NO: 481 


30 aa 


SEQ ID NO: 483 


HepCla segment 39 


90 nts 


SEQ ID NO: 484 


Polypeptide encoded by SEQ ID NO: 483 


30 aa 


SEQ ID NO: 485 


HepC 1 a segment 40 


90 nts 


SEQ ID NO: 486 


Polypeptide encoded by SEQ ID NO: 485 


30 aa 


SEQ ID NO: 487 


HepCla segment 41 


90 nts 


OpA tt\ VTA inn 

SEQ ID NO: 488 


Polypeptide encoded by SEQ ID NO: 487 


30 aa 


flp/\ TTV VTA A€\r\ 

SEQ ID NO: 489 


HepCla segment 42 


90 nts 


OPA TTTV VTA >*AA 

SEQ ID NO: 490 


Polypeptide encoded by SEQ ID NO: 489 


30 aa 


pTJA TTV VTA. AC\1 

SEQ ID NO: 491 


HepCl a segment 43 


90 nts 


C?T2f~\ TTV XT/A. ylAO 

SEQ ID NO: 492 


Polypeptide encoded by SEQ ID NO: 491 


30 aa 


obQ ID NU: 493 


T T _ . - X"^ % _ _ - - a Jit 

HepC I a segment 44 


90 nts 


oJbvJ ID NO: 45J4 


TV _ T_ __ a * J _ - J 1 1 ApjA TTV VTA j/\n 

Polypeptide encoded by SEQ ID NO: 493 


30 aa 


QT7 A TTV XTO« /IOC 


HepCla segment 45 


90 nts 


CTJA TTV XJ A. /4Q/T 

onV/ JUL-* INO. 4^0 


rolypeptide encoded by 5yfcQ ID NO: 495 


30 aa 


SEO ID NO- 497 


rTenfta cpornprit 4/\ 


ysj nis 


SEQ E) NO: 498 


Polypeptide encoded by SEQ ID NO: 497 


30 aa 


SEQ ID NO: 499 


HepCla segment 47 


90 nts 


SEQ ID NO: 500 


Polypeptide encoded by SEQ ID NO: 499 


30 aa 


SEQ ID NO: 501 


HepCla segment 48 


90 nts 


SEQ ID NO: 502 


Polypeptide encoded by SEQ ID NO: 501 


30 aa 
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SEQ ID NO: 503 


HepCla segment 49 


90nts 


SEQ ID NO: 504 


Polypeptide encoded by SEQ ID NO: 503 


30 aa 


SEQ ID NO: 505 


HepCla segment 50 


90 nts 


SEQ ID NO: 506 


Polypeptide encoded by SEQ ID NO: 505 


30 aa 


SEQ ID NO: 507 


HepCla segment 51 


90 nts 


SEQ ID NO: 508 


Polypeptide encoded by SEQ ID NO: 507 


30 aa 


SEQ ID NO: 509 


HepCla segment 52 


90 nts 


SEQ ID NO: 510 


Polypeptide encoded by SEQ ID NO: 509 


30 aa 


SEQ ID NO: 511 


HepCla segment 53 


90 nts 


SEQ ID NO: 512 


Polypeptide encoded by SEQ ID NO: 511 


30 aa 


SEQ ID NO: 513 


HepCla segment 54 


90 nts 


SEQ ID NO: 514 


Polypeptide encoded by SEQ ID NO: 513 


30 aa 


SEQ ID NO: 515 


HepCla segment 55 


90 nts 


SEQ ID NO: 516 


Polypeptide encoded by SEQ ID NO: 515 


30 aa 


SEQ ID NO: 517 


HepCla segment 56 


90 nts 


rtT?r>, TTX VTA rin 

SEQ ID NO: 518 


Polypeptide encoded by SEQ ID NO: 517 


30 aa 


SEQ ID NO: 519 


HepCla segment 57 


90 nts 


D-bQ JD NU: 520 


Polypeptide encoded by SEQ ID NO: 519 


30 aa 




XJr»_/ r M n conmpnt CO 

riep^ia segmem Do 


90 nts 


SEQ ID NO: 522 


Polypeptide encoded by SEQ ID NO: 521 


30 aa 


SEQ ID NO: 523 


HepCla segment 59 


90 nts 


SEQ ID NO: 524 


Polypeptide encoded by SEQ ID NO: 523 


30 aa 


SEQ ID NO: 525 


HepCla segment 60 


90 nts 


SEQ ID NO: 526 


Polypeptide encoded by SEQ ID NO: 525 


30 aa 
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TT\ MA- COT 


Mepcia segment 61 


90nts 


oHQ ID NO. 5zo 


Polypeptide encoded by SEQ ID NO: 527 


30 aa 


SEQ ID NO: 529 


HepCla segment 62 


90 nts 


OT7/*\ TT\ \TrV. fin 

SEQ ID NO: 530 


Polypeptide encoded by SEQ ID NO: 529 


30 aa 


SEQ ID NO: 531 


HepCla segment 63 


90 nts 


SEQ ID NO: 532 


Polypeptide encoded by SEQ ID NO: 53 1 


30 aa 


SEQ ID NO: 533 


HepCla segment 64 


90 nts 


OT?/^ If \ \TA OA 

SEQ ID NO: 534 


Polypeptide encoded by SEQ ID NO: 533 


30 aa 


SEQ ID NO: 535 


HepCla segment 65 


90 nts 


SEQ ID NO: 536 


Polypeptide encoded by SEQ ID NO: 535 


30 aa 


SEQ ID NO: 537 


HepCl a segment 66 


90 nts 


SEQ ID NO: 538 


Polypeptide encoded by SEQ ID NO: 537 


30 aa 


SEQ ID NO: 539 


HepCla segment 67 


90 nts 


SEQ ID NO: 540 


Polypeptide encoded by SEQ ID NO: 539 


30 aa 


CI?r\ TT\ XT/A. fil 

oEQ ID NO: 541 


HepCla segment 68 


90 nts 


OT?/~\ TT\ XT/A. £ An 

oEQ ID NO: 542 


Polypeptide encoded by SEQ ID NO: 541 


30 aa 


btSQ ID NO: 543 


HepCla segment 69 


90 nts 


oJc\J ID NO. 344 


1_ A. * J _ -1 - -» L. _ _ rtT?A TTV \T/\ C A^t 

Polypeptide encoded by SEQ ID NO: 543 


30 aa 


SEO ID NO* 545 


HenCla ^epment 70 


on nto 
yv nis 


SEQ H) NO: 546 


Polypeptide encoded by SEQ ID NO:545 


30 aa 


SEQ ID NO: 547 


HepCla segment 71 


90 nts 


SEQ ID NO: 548 


Polypeptide encoded by SEQ ID NO: 547 


30 aa 


SEQ ID NO: 549 


HepCla segment 72 


90 nts 


SEQ ID NO: 550 


Polypeptide encoded by SEQ ID NO: 549 


30 aa 



WO 01/090197 



PCT/AU01/00622 



-40- 





j! 


i i 

I immw i 
il • 


SEQIDNO: 551 


HepCla segment 73 


90 nts 


SEQ ID NO: 552 


Polypeptide encoded by SEQ ID NO: 551 


30 aa 


SEQ ID NO: 553 


HepCla segment 74 


90 nts 


SEQ ID NO: 554 


Polypeptide encoded by SEQ ID NO: 553 


30 aa 


SEQ ID NO: 555 


HepCla segment 75 


90 nts 


SEQ ID NO: 556 


Polypeptide encoded by SEQ ID NO: 555 


30 aa 


SEQ ID NO: 557 


HepCla segment 76 


90 nts 


SEQ ID NO: 558 


Polypeptide encoded by SEQ ID NO: 557 


30 aa 


SEQ ID NO: 559 


HepC 1 a segment 77 


90 nts 


SEQ ID NO: 560 


Polypeptide encoded by SEQ ID NO: 559 


30 aa 


SEQ ID NO: 561 


HepC 1 a segment 78 


90 nts 


SEQ ID NO: 562 


Polypeptide encoded by SEQ ID NO: 561 


30 aa 


SEQ ID NO: 563 


HepCla segment 79 


90 nts 


SEQ ID NO: 564 


Polypeptide encoded by SEQ ID NO: 563 


30 aa 


SEQ ID NO: 565 


HepCla segment 80 


90 nts 


SEQ ID NO: 566 


Polypeptide encoded by SEQ ID NO: 565 


30 aa 


SEQ ID NO: 567 


HepCla segment 81 


90 nts 


MiQ ID NO: 568 


Polypeptide encoded by SEQ ID NO: 567 


30 aa 




rtepuia segment oz 


90 nts 


SEQIDNO: 570 


Polypq>tide encoded by SEQ ID NO: 569 


30 aa 


SEQ ED NO: 571 


HepCla segment 83 


90 nts 


SEQIDNO: 572 


Polypeptide encoded by SEQ ID NO: 571 


30 aa 


SEQIDNO: 573 


HepCla segment 84 


90 nts 


SEQ ID NO: 574 | 


Polypeptide encoded by SEQ ID NO: 573 


30 aa 
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HepCla segment 85 


90 nts 


CT?n tti xrrv <'7^ 

oHQ 11/ INU. D /0 


T>— 1 a- * -1 _ ^ . J J | f»Y?A TTN VTA. C L? 

Polypeptide encoded by SEQ ID NO: 575 


30 aa 


OTTO TT\ XTfV CT7 
SEQ ID NU. J / / 


HepCla segment 86 


90 nts 


SEQ ID JNU: j /o 


Polypeptide encoded by SEQ ID NO: 577 


30 aa 


SEQ ID NU: 579 


HepCla segment 87 


90 nts 


nrjrv TT"v XT/"\. CO A 

SEQ ID NU: 580 


Polypeptide encoded by SEQ ID NO: 579 


30 aa 


C*T2f\ TT~\ \TA. f01 

SEQ ID NO: 58 1 


TT a Oft 

HepCla segment 88 


90 nts 


OT?/~\ TT"\ \TA. fOO 

SEQ ID NO: 582 


Polypeptide encoded by SEQ ID NO: 581 


30 aa 


SEQ ID NO: 583 


HepCla segment 89 


90 nts 


SEQ ID NU: 584 


Polypeptide encoded by SEQ ID NO: 583 


30 aa 


SEQ ID NO: 585 


TT—A1 — a ftft 

HepCla segment 90 


90 nts 


SEQ ID NU: 586 


Polypeptide encoded by SEQ ID NO: 585 


30 aa 


crA tt\ xt/^v. con 

SEQ ID NU: 587 


HepCl a segment 91 


90 nts 


PCA TTN XTfV COO 

SEQ ID NU: 5oo 


Polypeptide encoded by SEQ ID NO: 587 


30 aa 


oHQ 1L> NU. 3oy 


HepCla segment 92 


90 nts 


QT?0 TTfc KTfV <Qf\ 


Polypeptide encoded by SEQ ID NO: 589 


30 aa 


OXLI^ LU VSKJ. Dy I 


riepv^ia segment yo 


90 nts 




rojypepnae encooeu oy oEQ id nu: 55*1 


30 aa | 


SEO ID NO* 593 


HenCla segment 94 


On ntc 

7v nis 


SEQ ID NO: 594 


Polypeptide encoded by SEQ ID NO: 593 


30 aa 


SEQ ID NO: 595 


HepCla segment 95 


90 nts 


SEQ ID NO: 596 


Polypeptide encoded by SEQ ID NO: 595 


30 aa 


SEQ ID NO: 597 


HepCla segment % 


90 nts 


SEQ ID NO: 598 


Polypeptide encoded by SEQ ID NO: 597 


30 aa ''■ 
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SEQ ID NO: 599 


HepCla segment 97 


90nts 


SEQ ID NO: 600 


Polypeptide encoded by SEQ ID NO: 599 


30 aa 


SEQ ID NO: 601 


HepCla segment 98 


90 nts 


SEQ ID NO: 602 


Polypeptide encoded by SEQ ID. NO: 601 


30 aa 


SEQ ID NO: 603 


HepCla segment 99 


90 nts 


SEQ ID NO: 604 


Polypeptide encoded by SEQ ID NO: 603 


30 aa 


SEQ ID NO: 605 


HepCla segment 100 


90 nts 


SEQ ID NO: 606 


Polypeptide encoded by SEQ ID NO: 605 


30 aa 


SEQ ID NO: 607 


HepCla segment 101 


90 nts 


SEQ ID NO: 608 


Polypeptide encoded by SEQ ID NO: 607 


30 aa 


SEQ ID NO: 609 


HepCla segment 102 


90 nts 


SEQ ID NO: 610 


Polypeptide encoded by SEQ ID NO: 609 


30 aa 


SEQ ID NO: 611 


HepCla segment 103 


90 nts 


SEQ ID NO: 612 


Polypeptide encoded by SEQ ID NO: 61 1 


30 aa 


SEQ ID NO: 613 


HepCla segment 104 


90 nts 


SEQ ID NO: 614 


Polypeptide encoded by SEQ ID NO: 613 


30 aa 


SEQ ID NO: 615 


HepCla segment 105 


90 nts 


SEQ ID NO: 616 


Polypeptide encoded by SEQ ID NO: 615 


30 aa 


opn Tn un* £i 7 

uCy kU iNU. Ol / 


nepv^ i a segment i uo 


90 nts 


SEQ ID NO: 618 


Polypeptide encoded by SEQ ID NO: 617 


30 aa 


SEQ ED NO: 619 


HepCla segment 107 


90 nts 


SEQ ID NO: 620 


Polypeptide encoded by SEQ ID NO: 619 


30 aa 


SEQ ID NO: 621 


HepCla segment 108 


90 nts 


SEQ ID NO: 622 


Polypeptide encoded by SEQ ID NO: 62 1 


30 aa 
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SEQ ID NO: 623 


HepCla segment 109 


90nts 


SEQ ID NO: 624 


Polypeptide encoded by SEQ ID NO: 623 


30 aa 


SEQ ID NO: 625 


HepC la segment 110 


90 nts 


SEQ ID NO: 626 


Polypeptide encoded by SEQ ID NO: 625 


30 aa 


SEQ ID NO: 627 


HepCla segment 1 1 1 


90 nts 


SEQ ID NO: 628 


Polypeptide encoded by SEQ ID NO: 627 


30 aa 


SEQ ID NO: 629 


HepCla segment 1 12 


90 nts 


SEQ ID NO: 630 


Polypeptide encoded by SEQ ID NO: 629 


30 aa 


SEQ ID NO: 631 


HepCla segment 113 


90 nts 


SEQ ID NO: 632 


Polypeptide encoded by SEQ ID NO: 631 


30 aa 


SEQ ID NO: 633 


HepCla segment 114 


90 nts 


SEQ ID NO: 634 


Polypeptide encoded by SEQ ID NO: 633 


30 aa 


SEQ ID NO: 635 


HepCla segment 1 15 


90 nts 


SEQ ID NO: 636 


Polypeptide encoded by SEQ ID NO: 635 


30 aa 


SEQ ID NO: 637 


HepCla segment 116 


90 nts 


SEQ ID NO: 638 


Polypeptide encoded by SEQ ID NO: 637 


30 aa 


OT7A TT"X \TA . f>f\ 

SEQ ID NO: 639 


HepCla segment 1 17 


90 nts 


SEQ ID NO: 640 


Polypeptide encoded by SEQ ID NO: 639 


30 aa 


oJtiy UU IN LI. 041 


xiepcia segment i lo 


90 nts 


SEQ ID NO: 642 


Polypeptide encoded by SEQ ID NO: 641 


30 aa 


SEQ ID NO: 643 


HepCla segment 119 


90 nts 


SEQ ID NO: 644 


Polypeptide encoded by SEQ ID NO: 643 


30 aa 


SEQ ID NO: 645 


HepCla segment 120 


90 nts 


SEQ ID NO: 646 


Polypeptide encoded by SEQ ID NO: 645 


30 aa 
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SEQ ID NO: 647 


HepCl a segment 121 


90 nts 


SEQ ID NO: 648 


Polypeptide encoded by SEQ ID NO: 647 


30 aa 


SEQ ID NO: 649 


HepCla segment 122 


90 nts 


SEQ ID NO: 650 


Polypeptide encoded by SEQ ID NO: 649 


30 aa 


SEQ ID NO: 651 


HepCla segment 123 


90 nts 


SEQ ID NO: 652 


Polypeptide encoded by SEQ ID NO: 651 


30 aa 


SEQ ID NO: 653 


HepCla segment 124 


90 nts 


SEQ ID NO: 654 


Polypeptide encoded by SEQ ID NO: 653 


30 aa 


SEQ ID NO: 655 


HepCla segment 125 


90 nts 


SEQ ID NO: 656 


Polypeptide encoded by SEQ ID NO: 655 


30 aa 


SEQ ID NO: 657 


HepCla segment 126 


90 nts 


SEQ ID NO: 658 


Polypeptide encoded by SEQ ID NO: 657 


30 aa 


SEQ ID NO: 659 


HepCla segment 127 


90 nts 


SEQ ID NO: 660 


Polypeptide encoded by SEQ ID NO: 659 


30 aa 


SEQ ID NO: 661 


HepCla segment 128 


90 nts 


SEQ ID NO: 662 


Polypeptide encoded by SEQ ID NO: 661 


30 aa 


SEQ ID NO: 663 


HepCla segment 129 


90 nts 


SEQ ID NO: 664 


Polypeptide encoded by SEQ ID NO: 663 


30 aa 


CCA TT» XT/"V- CAC 

bbQ ID NO. oo j 


HepC 1 a segment 1 30 


90 nts 


SEQ ID NO: 666 


Polypeptide encoded by SEQ ID NO: 665 


30 aa 


SEQ ID NO: 667 


HepCla segment 131 


90 nts 


SEQ ID NO: 668 


Polypeptide encoded by SEQ ID NO: 667 


30 aa 


SEQ ID NO: 669 


HepCla segment 132 


90 nts 


SEQ ID NO: 670 


Polypeptide encoded by SEQ ID NO: 669 


30 aa 
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SEQIDNO: 671 


HepCl a segment 133 


90 nts 


SEQ ID NO: 672 


Polypeptide encoded by SEQ ID NO: 671 


30 aa 


SEQ ID NO: 673 


HepCla segment 134 


90 nts 


SEQ ID NO: 674 


Polypeptide encoded by SEQ ID NO: 673 


30 aa 


SEQ ID NO: 675 


HepCla segment 135 


90 nts 


SEQ ID NO: 676 


Polypeptide encoded by SEQ ID NO: 675 


30 aa 


SEQ ID NO: 677 


HepCla segment 136 


90 nts | 


SEQ ID NO: 678 


Polypeptide encoded by SEQ ID NO: 677 


30 aa 


SEQ ID NO: 679 


HepCla segment 137 


90 nts 


SEQ ID NO: 680 


Polypeptide encoded by SEQ ID NO: 679 


30 aa 


SEQIDNO: 681 


HepCla segment 138 


90 nts 


SEQ ID NO: 682 


Polypeptide encoded by SEQ ID NO: 681 


30 aa 


SEQ ID NO: 683 


HepCla segment 139 


90 nts 


SEQ ID NO: 684 


Polypeptide encoded by SEQ ID NO: 683 


30 aa 


SEQ ID NO: 685 


HepCla segment 140 


90 nts 


SEQ ID NO: 686 


Polypeptide encoded by SEQ ID NO: 685 


30 aa 


SEQ ID NO: 687 


HepCla segment 141 


90 nts 


SEQ ID NO: 688 


Polypeptide encoded by SEQ ID NO: 687 


30 aa 


Miy 11J JNU. OoV 


wepc la segment 142 


90 nts 


SEQ ID NO: 690 


Polypeptide encoded by SEQ ID NO: 689 


30 aa 


SEQ ID NO: 691 


HepCla segment 143 


90 nts 


SEQ ID NO: 692 


Polypeptide encoded by SEQ ID NO: 691 


30 aa 


SEQ ID NO: 693 


HepCla segment 144 


90 nts 


SEQ ID NO: 694 


Polypeptide encoded by SEQ ID NO: 693 


30 aa 
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SEQ ID NO: 695 


HepCla segment 145 


90nts 


SEQ ID NO: 696 


Polypeptide encoded by SEQ ID NO: 695 


30 aa 


SEQ ID NO: 697 


HepCla segment 146 


90 nts 


SEQ ID NO: 698 


Polypeptide encoded by SEQ ID NO: 697 


30 aa 


SEQ ID NO: 699 


HepCla segment 147 


90 nts 


SEQ ID NO: 700 


Polypeptide encoded by SEQ ID NO: 699 


30 aa 


SEQ ID NO: 701 


HepC 1 a segment 1 48 


90 nts 


SEQ ID NO: 702 


Polypeptide encoded by SEQ ID NO: 701 


30 aa 


SEQ ID NO: 703 


HepCla segment 149 


90 nts 


SEQ ID NO: 704 


Polypeptide encoded by SEQ ID NO: 703 


30 aa 


SEQ ID NO: 705 


HepCla segment 150 


90 nts 


SEQ ID NO: 706 


Polypeptide encoded by SEQ ID NO: 705 


30 aa 


SEQ ID NO: 707 


HepCla segment 151 


90 nts 


SEQ ID NO: 708 


Polypeptide encoded by SEQ ID NO: 707 


30 aa 


SEQ ED NO: 709 


HepCla segment 152 


90 nts 


SEQ ID NO: 710 


Polypeptide encoded by SEQ ID NO: 709 


30 aa 


SEQ ID NO: 711 


HepCla segment 153 


90 nts 


SEQ ID NO: 712 


Polypeptide encoded by SEQ ID NO: 71 1 


30 aa 


MSQ ID NO: 713 


IT ■§ _ i A C A 

HepCla segment 154 


90 nts 


SEQ ID NO: 714 


Polypeptide encoded by SEQ ID NO: 713 


30 aa 


SEQ ID NO 715 


HepCla segment 155 


90 nts 


SEQ ID NO 716 


Polypeptide encoded by SEQ ID NO: 715 


30 aa 


SEQ E) NO: 717 


HepCla segment 156 


90 nts 


SEQ ID NO: 718 


Polypeptide encoded by SEQ ID NO: 717 


30 aa 
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SEQIDNO:719 


HepCl a segment 157 


90 nts 


SEQIDNO: 720 


Polypeptide encoded by SEQ ID NO: 719 


30 aa 


SEQIDNO: 721 


HepCla segment 158 


90 nts 


SEQIDNO: 722 


Polypeptide encoded by SEQ ID NO: 721 


30 aa 


SEQIDNO: 723 


HepCla segment 159 


90 nts 


SEQIDNO: 724 


Polypeptide encoded by SEQ ID NO: 723 


30 aa 


SEQIDNO: 725 


HepCla segment 160 


90 nts 


SEQIDNO: 726 


Polypeptide encoded by SEQ ID NO: 725 


30 aa 


SEQIDNO: 727 


HepCla segment 161 


90 nts 


SEQIDNO: 728 


Polypeptide encoded by SEQ ID NO: 727 


30 aa 


SEQIDNO: 729 


HepCla segment 162 


90 nts 


SEQIDNO: 730 


Polypeptide encoded by SEQ ID NO: 729 


30 aa 


SEQIDNO: 731 


HepCla segment 163 


90 nts 


SEQIDNO: 732 


Polypeptide encoded by SEQ ID NO: 731 


30 aa 


SEQIDNO: 733 


HepCla segment 164 


90 nts 


SEQIDNO: 734 


Polypeptide encoded by SEQ ID NO: 733 


30 aa 


SEQIDNO: 735 


HepCla segment 165 


90 nts 


SEQIDNO: 736 


Polypeptide encoded by SEQ ID NO: 735 


30 aa 


SEQIDNO: 737 


HepCla segment 166 


90 nts 


SEQIDNO: 738 


Polypeptide encoded by SEQ ID NO: 737 


30 aa 


SEQIDNO: 739 


HepCla segment 167 


90 nts 


SEQIDNO: 740 


Polypeptide encoded by SEQ ID NO: 739 


30 aa 


SEQIDNO: 741 


HepCla segment 168 


90 nts 


SEQIDNO: 742 


Polypeptide encoded by SEQ ID NO: 741 


30 aa 
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SEQ ID NO: 743 


HepCla segment 169 


90nts 


SEQ ID NO: 744 


Polypeptide encoded by SEQ ID NO: 743 


30 aa 


SEQ ID NO: 745 


HepCla segment 170 


90 nts 


SEQ ID NO: 746 


Polypeptide encoded by SEQ ID NO: 745 


30 aa 


SEQ ID NO: 747 


HepCla segment 171 


90 nts 


SEQ ID NO: 748 


Polypeptide encoded by SEQ ID NO: 747 


30 aa 


SEQ ID NO: 749 


HepCla segment 172 


90 nts 


SEQ ID NO: 750 


Polypeptide encoded by SEQ ID NO: 749 


30 aa 


SEQ ID NO: 751 


HepCla segment 173 


90 nts 


SEQ ID NO: 752 


Polypeptide encoded by SEQ ID NO: 751 


30 aa 


SEQ ID NO: 753 


HepCla segment 174 


90 nts 


SEQ ID NO: 754 


Polypeptide encoded by SEQ ID NO: 753 


30 aa 


SEQ ID NO: 755 


HepCla segment 175 


90 nts 


SEQ ID NO: 756 


Polypeptide encoded by SEQ ID NO: 755 


30 aa 


SEQ ID NO: 757 


HepCla segment 176 


90 nts 


SEQ ID NO: 758 


Polypeptide encoded by SEQ ID NO: 757 


30 aa 


SEQ ID NO: 759 


HepCla segment 177 


90 nts 


SEQ ID NO: 760 


Polypeptide encoded by SEQ ID NO: 759 


30 aa 




11 Artful o cAomont 170 

ncpv^ i a segment i / o 


90 nts 


SEQ H) NO: 762 


Polypeptide encoded by SEQ ID NO: 761 


30 aa 


SEQ ID NO: 763 


HepCla segment 179 


90 nts 


SEQ ID NO: 764 


Polypeptide encoded by SEQ ID NO: 763 


30 aa 


SEQ ID NO: 765 


HepCla segment 180 


90 nts 


SEQ ID NO: 766 


Polypeptide encoded by SEQ ID NO: 765 


30 aa 1 
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SEQ ID NO: 767 


HepCl a segment 181 


90nts 


SEQ ID NO: 768 


Polypeptide encoded by SEQ ID NO: 767 


30 aa 


SEQ ID NO: 769 


HepCla segment 182 


90nts 


SEQ ID NO: 770 


Polypeptide encoded by SEQ ID NO: 769 


30 aa 


SEQ ID NO: 771 


HepCla segment 183 


90 nts 


SEQ ID NO: 772 


Polypeptide encoded by SEQ ID NO: 771 


30 aa 


SEQ ID NO: 773 


HepCla segment 184 


90 nts 


SEQ ID NO: 774 


Polypeptide encoded by SEQ ED NO: 773 


30 aa 


SEQ ID NO: 775 


HepCla segment 185 


90 nts 


SEQ ID NO: 776 


Polypeptide encoded by SEQ ID NO: 775 


30 aa 


SEQ ID NO: 777 


HepCla segment 186 


90 nts 


SEQ ID NO: 778 


Polypeptide encoded by SEQ ID NO: 777 


30 aa 


SEQ ID NO: 779 


HepCla segment 187 


90 nts j 


SEQ ID NO: 780 


Polypeptide encoded by SEQ ID NO: 779 


30 aa 


SEQ ID NO: 781 


HepCla segment 188 


90 nts 


SEQ ID NO: 782 


Polypeptide encoded by SEQ ID NO: 781 


30 aa 


SEQ ID NO: 783 


HepCla segment 189 


90 nts 


^W^^V TTN VT/\ A 

SEQ ID NO: 784 


Polypeptide encoded by SEQ ID NO: 783 


30 aa 




HepCla segment 190 


90 nts 


SEQ ID NO: 786 


Polypeptide encoded by SEQ ED NO: 785 


30 aa 


SEQ ID NO: 787 


HepCla segment 191 


90 nts 


SEQ ID NO: 788 


Polypeptide encoded by SEQ ID NO: 787 


30 aa 


SEQ ID NO: 789 


HepCla segment 192 


90 nts 


SEQ ID NO: 790 


Polypeptide encoded by SEQ ID NO: 789 


30 aa 
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nepv^ia segment lyj 


90nts 


ctjn TT> "Kin- 709 


polypeptide encoded oy oliQ ID NU: 791 


30 aa 


o±iv£ NU. /yj 


HepC la segment 194 


90nts 




Polypeptide encoded by SEQ ID NO: 793 


30 aa 


miq id NU: /yj 


HepC la segment 195 


90 nts 


OCA Tf \ XTfV TA£ 

MiQ ID NU. /9o 


Polypeptide encoded by SEQ ID NO: 795 


30 aa 


CT2f \ TV\ XT/V TOT 

ocQ ID NU. /y/ 


HepC la segment 196 


90 nts 


CCA TT\ XT A- TOO 


Polypeptide encoded by SEQ ID NO: 797 


30 aa 


CX7A TPfc XT a. *7nn 

brsQ id NU. /yy 


HepC la segment 197 


90 nts 


OT?r\ jr\ XT A. OA A 

biiQ ID NU: oUO 


Polypeptide encoded by SEQ ID NO: 799 


30 aa 


CTJA TTV XT A. qai 

or\l ID NU: oOl 


HepC la segment 198 


90 nts 


dxiQ ID NU. oUz 


T\ - * _ a' -1 _1 1 1 npA TT% \TA OSl-f 

Polypeptide encoded by SEQ ID NO: 801 


30 aa 


Qijn TO xjw qao 
oJiy 1LJ JNU. o\)d 


HepC 1 a segment 1 99 


90 nts 




polypeptide encoded oy oEQ ID NO: 803 


30 aa 




xiepcia segment zuu 


90 nts 




roiypepnoe encoaea oy oiiQ ID NU: olo 


30 aa | 


cun m MO- 80,7 
ijEy u/ in ou/ 


TTj*rk/^*l 0 OArrmont OA1 

oepi^ 1 a segment zu 1 


A mT a. 

45 nts 


cpn TH XTfY ft OR 

ODy XL/ INVA OUO 


roiypepuae encoaea oy oxiy iD NU: oU7 


15 aa 


SEQ ©NO: 809 


HepC 1 a scrambled 


170SS ntc 

1 / 7JJ 11 Lo 


SEQIDNO:810 


Polypeptide encoded by SEQ ID NO: 809 


5985 aa 


SEQIDNO: 811 


HepC Cassette A 


6065 nts 


SEQIDNO:812 


Polypeptide encoded by SEQ ID NO: 81 1 


2011 aa 


SEQIDNO: 813 


HepC Cassette B 


6069 nts 


SEQIDNO: 814 


Polypeptide encoded by SEQ ID NO: 813 


2010 aa 
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SEQIDNO: 815 


HepC Cassette C 


6030 nts 


SEQIDNO: 816 


Polypeptide encoded by SEQ ID NO: 815 


1997 aa 


SEQIDNO:817 


gplOO consensus polypeptide 


661 aa 


SEQIDNO: 818 


MART consensus polypeptide 


U8aa 


SEQ ID NO: 819 


TRP-1 consensus polypeptide 


248 aa 


SEQ ID NO: 820 


Tyros consensus polypeptide 


529 aa 


SEQIDNO: 821 


TRP2 consensus polypeptide 


519 aa 


SEQ ID NO: 822 


MC1R consensus polypeptide 


317 aa 


SEQIDNO: 823 


MUC1F consensus polypeptide 


125 aa 


SEQ ID NO: 824 


MUC1R consensus polypeptide 


312 aa 


SEQ ID NO: 825 


BAGE consensus polypeptide 


43 aa 


SEQ ID NO: 826 


GAGE-1 consensus polypeptide 


138 aa 


SEQ ID NO: 827 


gpl001r»4 consensus polypeptide 


51 aa 


SEQ ID NO: 828 


MAGE-1 consensus polypeptide 


309 aa 


SEQ ID NO: 829 


MAGE-3 consensus polypeptide 


314 aa 


SEQ ID NO: 830 


PRAME consensus polypeptide 


509 aa 


SEQIDNO: 831 


TRP21N2 consensus polypeptide 


54 aa 


SEQ ID NO: 832 


NYNSOla consensus polypeptide 


180 aa 


SEQ ID NO: 833 


NYNSOlb consensus polypeptide 


c o 

58 aa 


SEQ ID NO: 834 


LAGE1 consensus polypeptide 


180 aa 


SEQ ID NO: 835 


gpl 00 segment 1 


90 nts 


SEQIDNO: 836 


Polypeptide encoded by SEQ ID NO: 835 


30 aa 


SEQ ID NO: 837 


gpl 00 segment 2 


90 nts 


SEQIDNO: 838 


Polypeptide encoded by SEQ ID NO: 837 


30 aa 
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SEQ ID NO: 839 


gpIOO segment 3 


90nts 


SEQ ID NO; 840 


Polypeptide encoded by SEQ ID NO: 839 


30 aa 


SEQ ID NO: 841 


3>100 segment 4 


90nts 


SEQ ID NO: 842 


Polypeptide encoded by SEQ ID NO: 841 


30 aa 


SEQ ID NO: 843 


0)100 segment 5 


90nts 


SEQ ID NO: 844 


Polypeptide encoded by SEQ ID NO: 843 


30 aa 


SEQ ID NO: 845 


gpIOO segment 6 


90nts 


SEQ ID NO: 846 


Polypeptide encoded by SEQ ID NO: 845 


30 aa 


SEQ ID NO: 847 


gpIOO segment 7 


90nts 


SEQ ID NO: 848 


Polypeptide encoded by SEQ ID NO: 847 


30 aa 


SEQ ID NO: 849 


0)100 segment 8 


90nts 


SEQ ID NO: 850 


Polypeptide encoded by SEQ ID NO: 849 


30 aa 


SEQ ID NO: 851 


0)100 segment 9 


90 nts 


SEQ ID NO: 852 


Polypeptide encoded by SEQ ID NO: 851 


30 aa 


SEQ ID NO: 853 


0)100 segment 10 


90 nts 


SEQ ID NO: 854 


Polypeptide encoded by SEQ ID NO: 853 


30 aa 


SEQ ID NO: 855 


0)100 segment 11 


90 nts 


SEQ ID NO: 856 


Polypeptide encoded by SEQ ID NO: 855 


30 aa 


5tl.y ID NU. OJ 1 


0>iuu segment Iz 


90 nts 


SEQ ID NO: 858 


Polypeptide encoded by SEQ ID NO: 857 


30 aa 


SEQ ID NO: 859 


gpIOO segment 13 


90 nts 


SEQ ID NO: 860 


Polypeptide encoded by SEQ ID NO: 859 


30 aa 


SEQ ID NO: 861 


0)100 segment 14 


90 nts 


SEQ ID NO: 862 


Polypeptide encoded by SEQ ID NO: 861 


30 aa 
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SEQ ID NO: 863 


gplOO segment 15 


90 nts 


SEQ ID NO: 864 


Polypeptide encoded by SEQ ID NO: 863 


30 aa 


SEQ ID NO: 865 


gplOO segment 16 


90 nts 


SEQ ID NO: 866 


Polypeptide encoded by SEQ ID NO: 865 


30 aa 


SEQ ID NO: 867 


gplOO segment 17 


90 nts 


SEQ ID NO: 868 


Polypeptide encoded by SEQ ID NO: 867 


30 aa 


SEQ ID NO: 869 


gplOO segment 18 


90 nts 


SEQ ID NO: 870 


Polypeptide encoded by SEQ ID NO: 869 


30 aa 


SEQ ID NO: 871 


gplOO segment 19 


90 nts 


SEQ ID NO: 872 


Polypeptide encoded by SEQ ID NO: 871 


30 aa 


SEQ ID NO: 873 


gplOO segment 20 


90 nts 


SEQ ID NO: 874 


Polypeptide encoded by SEQ ID NO: 873 


30 aa 


SEQ ID NO: 875 


gplOO segment 21 


90 nts 


SEQ ID NO: 876 


Polypeptide encoded by SEQ ID NO: 875 


30 aa 


SEQ ID NO: 877 


gplOO segment 22 


90 nts 


SEQ ID NO: 878 


Polypeptide encoded by SEQ ID NO: 877 


30 aa 


SEQ ID NO: 879 


gplOO segment 23 


90 nts 


O -f-i f-\ TT\ \TA AAA 

SEQ ID NO: 880 


Polypeptide encoded by SEQ ID NO: 879 


30 aa j 


oiiy IL) IMLJ. OO 1 


gpiuu segment z4 


90 nts 


SEQ ID NO: 882 


Polypeptide encoded by SEQ ID NO: 881 


30 aa 


SEQ1DNO: 883 


gplOO segment 25 


90 nts 


SEQ ID NO: 884 


Polypeptide encoded by SEQ ID NO: 883 


30 aa 


SEQ ID NO: 885 


gp 100 segment 26 


90 nts 


SEQ ID NO: 886 


Polypeptide encoded by SEQ ID NO: 885 


30 aa 
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SEQ ID NO: 887 


gplOO segment 27 


90nts 


SEQ ED NO: 888 


Polypeptide encoded by SEQ ID NO: 887 


30 aa 


SEQ ID NO: 889 


gplOO segment 28 


90 nts 


SEQ ID NO: 890 


Polypeptide encoded by SEQ ID NO: 889 


30 aa 


SEQ ID NO: 891 


gplOO segment 29 


90 nts 


SEQ ID NO: 892 


Polypeptide encoded by SEQ ID NO: 891 


30 aa 


SEQ ID NO: 893 


gplOO segment 30 


90 nts 


SEQ ID NO: 894 


Polypeptide encoded by SEQ ID NO: 893 


30 aa 


SEQ ID NO: 895 


gplOO segment 31 


90 nts 


SEQ ID NO: 896 


Polypeptide encoded by SEQ ID NO: 895 


30 aa 


SEQ ID NO: 897 


gplOO segment 32 


90 nts 


SEQ ID NO: 898 


Polypeptide encoded by SEQ ID NO: 897 


30 aa 


SEQ ID NO: 899 


gplOO segment 33 


90 nts 


SEQ ID NO: 900 


Polypeptide encoded by SEQ ID NO: 899 


30 aa 


SEQ ID NO: 901 


gp 100 segment 34 


90 nts 


SEQ ID NO: 902 


Polypeptide encoded by SEQ ID NO: 901 


30 aa 


SEQ ID NO: 903 


gplOO segment 35 


90 nts 


SEQ ID NO: 904 


Polypeptide encoded by SEQ ID NO: 903 


30 aa 


SEQ ID NO: 905 


gplOO segment 36 


90 nts j 


SEQ ID NO: 906 


Polypeptide encoded by SEQ ID NO: 905 


30 aa 


SEQ ID NO 907 


gplOO segment 37 


90 nts 


SEQ ID NO: 908 


Polypeptide encoded by SEQ ID NO: 907 


30 aa 


SEQ ID NO: 909 


gplOO segment 38 


90 nts 


SEQ ID NO: 910 


Polypeptide encoded by SEQ ID NO: 909 


30 aa 
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SEQIDNO: 911 


gplOO segment 39 


90nts 


SEQIDNO: 912 


Polypeptide encoded by SEQ ID NO: 91 1 


30 aa 


SEQIDNO: 913 


gplOO segment 40 


90 nts 


SEQIDNO: 914 


Polypeptide encoded by SEQ ID NO: 913 


30 aa 


SEQIDNO: 915 


0)100 segment 41 


90 nts 


SEQIDNO: 916 


Polypeptide encoded by SEQ ID NO: 915 


30 aa 


SEQIDNO: 917 


gp 100 segment 42 


90 nts 


SEQIDNO: 918 


Polypeptide encoded by SEQ ID NO: 917 


30 aa 


SEQIDNO: 919 


gplOO segment 43 


90 nts 


SEQIDNO: 920 


Polypeptide encoded by SEQ ID NO: 91 9 


30 aa 


SEQIDNO: 921 


gp 100 segment 44 


60nts 


SEQIDNO: 922 


Polypeptide encoded by SEQ ID NO: 921 


20 aa 


SEQIDNO: 923 


MART segment 1 


90 nts 


SEQIDNO: 924 


Polypeptide encoded by SEQ ID NO: 923 


30 aa 


SEQIDNO: 925 


MART segment 2 


90 nts 


SEQIDNO: 926 


Polypeptide encoded by SEQ ID NO: 925 


30 aa 


SEQIDNO: 927 


MART segment 3 


90 nts 


SEQ ID NO: 928 


Polypeptide encoded by SEQ ID NO: 927 


30 aa 


SEQ ID NO: 929 


MART segment 4 


90 nts 


SEQIDNO: 930 


Polypeptide encoded by SEQ ID NO: 929 


30 aa 


SEQ ID NO: 931 


MART segment 5 


90 nts 


SEQ ID NO: 932 


Polypeptide encoded by SEQ ID NO: 931 


30 aa 


SEQ ID NO: 933 


MART segment 6 


90 nts 


SEQ ID NO: 934 


Polypeptide encoded by SEQ ID NO: 933 


30 aa 
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SEQ ID NO: 935 


MART segment 7 


90 nts 


SEQ ID NO: 936 


Polypeptide encoded by SEQ ID NO: 935 


30 aa 


SEQ ID NO: 937 


MART segment 8 


51 nts 


SEQ ID NO: 938 


Polypeptide encoded by SEQ ID NO: 937 


17 aa 


SEQ ID NO: 939 


trp-1 segment 1 


90 nts 


SEQ ID NO: 940 


Polypeptide encoded by SEQ ID NO: 939 


30 aa 


SEQ ID NO: 941 


trp-1 segment 2 


90 nts 


SEQ ID NO: 942 


Polypeptide encoded by SEQ ID NO: 941 


30 aa 


SEQ ID NO: 943 


trp-1 segment 3 


90 nts 


SEQ ID NO: 944 


Polypeptide encoded by SEQ ID NO: 943 


30 aa 


SEQ ID NO: 945 


trp-1 segment 4 


90 nts 


SEQ ID NO: 946 


Polypeptide encoded by SEQ ID NO: 945 


30 aa 


SEQ ID NO: 947 


trp-1 segment 5 


90 nts 


SEQ ID NO: 948 


Polypeptide encoded by SEQ ID NO: 947 


30 aa 


SEQ ID NO: 949 


trp-1 segment 6 


90 nts 


npA ttv \ta r\c/\ 

SEQ ID NO: 950 


Polypeptide encoded by SEQ ID NO: 949 


30 aa 


pt?a npv \ta. /\^* 

SEQ ID NO: 951 


trp-1 segment 7 


90 nts | 


SEQ ID NO: 952 


Polypeptide encoded by SEQ ID NO: 951 


30 aa 




up-i segment o 


90 nts 


SEQ ID NO: 954 


Polypeptide encoded by SEQ ID NO: 953 


30 aa 


SEQ ID NO: 955 


trp-1 segment 9 


90 nts 


SEQ ID NO: 956 


Polypeptide encoded by SEQ ID NO: 955 


30 aa 


SEQ ID NO: 957 


trp-1 segment 10 


90 nts 


SEQ ID NO: 958 


Polypeptide encoded by SEQ ID NO: 957 


30 aa 
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SEQIDNO:959 


trp-1 segment 11 


90nts 


SEQ ID NO: 960 


Polypeptide encoded by SEQ ID NO: 959 


30 aa 


SEQIDNO: 961 


trp-1 segment 12 


90nts 


SEQIDNO:962 


Polypeptide encoded by SEQ ID NO: 961 


30 aa 


SEQIDNO: 963 


trp-1 segment 13 


90nts 


SEQIDNO: 964 


Polypeptide encoded by SEQ ID NO: 963 


30 aa 


SEQIDNO: 965 


trp-1 segment 14 


90nts 


SEQIDNO: 966 


Polypeptide encoded by SEQ ID NO: 965 


30 aa 


SEQIDNO: 967 


trp-1 segment 15 


90nts 


SEQIDNO: 968 


Polypeptide encoded by SEQ ID NO: 967 


30 aa 


SEQIDNO: 969 


trp-1 segment 16 


81 nts 


SEQIDNO: 970 


Polypeptide encoded by SEQ ID NO: 969 


27 aa 


SEQIDNO: 971 


tyros segment 1 


90 nts 


SEQIDNO: 972 


Polypeptide encoded by SEQ ID NO: 971 


30 aa 


SEQIDNO: 973 


tyros segment 2 


90 nts 


SEQIDNO: 974 


Polypeptide encoded by SEQ ID NO: 973 


30 aa 


SEQIDNO: 975 


tyros segment 3 


90 nts 


SEQIDNO: 976 


Polypeptide encoded by SEQ ID NO: 975 


30 aa 


SEQIDNO: 977 


tyros segment 4 


90 nts 


SEQIDNO: 978 


Polypeptide encoded by SEQ ID NO: 977 


30 aa 


SEQIDNO: 979 


tyros segment 5 


90 nts 


SEQIDNO: 980 


Polypeptide encoded by SEQ ID NO: 979 


30 aa 


SEQIDNO: 981 


tyros segment 6 


90 nts 


SEQIDNO: 982 


Polypeptide encoded by SEQ ID NO: 981 


30 aa 
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SEQ ID NO: 983 


tyros segment 7 


90nts 


C^T^^% lit \T/\ /\0 A 

SEQ ID NO: 984 


Polypeptide encoded by SEQ ID NO: 983 


30 aa 


SEQ ID NO: 985 


tyros segment 8 


90nts 


SEQ ID NO: 986 


Polypeptide encoded by SEQ ID NO: 985 


30 aa | 


SEQ ID NO: 987 


tyros segment 9 


90 nts 


SEQ ID NO: 988 


Polypeptide encoded by SEQ ID NO: 987 


30 aa 


SEQ ID NO: 989 


tyros segment 10 


90 nts 


SEQ ID NO: 990 . 


Polypeptide encoded by SEQ ID NO: 989 


30 aa 


SEQ ID NO: 991 


tyros segment 1 1 


90 nts 


SEQ ID NO: 992 


Polypeptide encoded by SEQ ID NO: 991 


30 aa 


SEQ ID NO: 993 


tyros segment 12 


90 nts 


SEQ ID NO: 994 


Polypeptide encoded by SEQ ID NO: 993 


30 aa 


SEQ ID NO: 995 


tyros segment 13 


90 nts 


SEQ ID NO: 996 


Polypeptide encoded by SEQ ID NO: 995 


30 aa 


SEQ ID NO: 997 


tyros segment 14 


90 nts 


SEQ ID NO: 998 


Polypeptide encoded by SEQ ID NO: 997 


30 aa \ 


SEQ ID NO: 999 


tyros segment 15 


90 nts 


CCA TT\ XTA. 1 AAA 

aJbQ \D NO: 1 vuO 


Polypeptide encoded by SEQ ID NO: 999 


30 aa 


WO TH WY 1 001 


iyros segment i o 


90 nts 


SEQ ID NO: 1002 


Polypeptide encoded by SEQ ID NO: 1001 


30 aa 


SEQ ED NO: 1003 


tyros segment 17 


90 nts 


SEQ ID NO: 1004 


Polypeptide encoded by SEQ ID NO: 1003 


30 aa 


SEQ ID NO: 1005 


tyros segment 18 


90 nts 


SEQ ID NO. 1006 


Polypeptide encoded by SEQ ID NO: 1005 


30 aa 
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SEQIDNO: 1007 


tyros segment 19 


90nts 


SEQIDNO: 1008 


Polypeptide encoded by SEQ ED NO: 1007 


30 aa 


SEQIDNO: 1009 


tyros segment 20 


90nts 


SEQIDNO: 1010 


Polypeptide encoded by SEQ ID NO: 1009 


30 aa 


SEQIDNO: 1011 


tyros segment 21 


90nts 


SEQIDNO: 1012 


Polypeptide encoded by SEQ ID NO: 101 1 


30 aa 


SEQIDNO: 1013 


tyros segment 22 


90nts 


SEQIDNO: 1014 


Polypeptide encoded by SEQ ID NO: 1013 


30 aa 


SEQIDNO: 1015 


tyros segment 23 


90 nts 


SEQIDNO: 1016 


Polypeptide encoded by SEQ ID NO: 1015 

I 

tyros segment 24 


30 aa 


SEQIDNO: 1017 


90 nts 


SEQIDNO: 1018 


Polypeptide encoded by SEQ ID NO: 1017 


30 aa 


SEQIDNO: 1019 


tyros segment 25 


90 nts 


SEQIDNO: 1020 


Polypeptide encoded by SEQ ID NO: 1019 


30 aa 


SEQ ID NO: 1021 


tyros segment 26 


90 nts 


SEQIDNO: 1022 


Polypeptide encoded by SEQ ID NO: 1021 


30 aa 


SEQIDNO: 1023 


tyros segment 27 


90 nts 


SEQ ID NO: 1024 


Polypeptide encoded by SEQ ID NO: 1023 


30 aa 




hiMtn riji j i t OO 

tyros segment zo 


90 nts | 


SEQIDNO: 1026 


Polypeptide encoded by SEQ ID NO: 1 025 


30 aa 


SEQIDNO: 1027 


tyros segment 29 


90 nts 


SEQIDNO: 1028 


Polypeptide encoded by SEQ ID NO: 1027 


30 aa 


SEQIDNO: 1029 


tyros segment 30 


90 nts 


SEQIDNO: 1030 


Polypeptide encoded by SEQ ID NO: 1029 


30 aa 
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SEQIDNO: 1031 


tyros segment 31 


90 nts 


SEQ ID NO: 1032 


Polypeptide encoded by SEQ ID NO: 1031 


30 aa 


SEQ ID NO: 1033 


tyros segment 32 


90 nts 


SEQIDNO: 1034 


Polypeptide encoded by SEQ ID NO: 1033 


30 aa 


SEQIDNO: 1035 


tyros segment 33 


90 nts 


SEQIDNO: 1036 


Polypeptide encoded by SEQ ID NO: 1035 


30 aa 


SEQIDNO: 1037 


tyros segment 34 


90 nts 


SEQIDNO: 1038 


Polypeptide encoded by SEQ ID NO: 1037 


30 aa 


SEQIDNO: 1039 


tyros segment 35 


69 nts 


SEQIDNO: 1040 


Polypeptide encoded by SEQ ID NO: 1039 


23 aa 


SEQIDNO: 1041 


trp2 segment 1 


90 nts 


SEQIDNO: 1042 


Polypeptide encoded by SEQ ID NO: 1041 


30 aa 


SEQIDNO: 1043 


trp2 segment 2 


90 nts 


SEQIDNO: 1044 


Polypeptide encoded by SEQ ID NO: 1043 


30 aa 


SEQIDNO: 1045 


trp2 segment 3 


90 nts 


SEQ ID NO: 1046 


Polypeptide encoded by SEQ ID NO: 1045 


30 aa 


SEQ ID NO: 1047 


trp2 segment 4 


90 nts 


SEQIDNO: 1048 


Polypeptide encoded by SEQ ID NO: 1047 


30 aa 


CTJO TT> XTfV 1 f\AQ 

oE\l ID 1NU. iWy 


trpz segment 5 


90 nts 


SEQIDNO: 1050 


Polypeptide encoded by SEQ ID NO: 1049 


30 aa 


SEQIDNO: 1051 


trp2 segment 6 


90 nts 


SEQIDNO: 1052 


Polypeptide encoded by SEQ ID NO: 1 05 1 


30 aa 


SEQIDNO: 1053 


trp2 segment 7 


90 nts 


SEQIDNO: 1054 


Polypeptide encoded by SEQ ID NO: 1053 


30 aa 
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bbQ ID NO: 1055 


trpz segment 8 


90nts 


SEQIDNO: 1056 


Polypeptide encoded by SEQ ID NO: 1055 


30 aa 


SEQIDNO: 1057 


trp2 segment 9 


90nts 


SEQIDNO: 1058 


Polypeptide encoded by SEQ ID NO: 1057 


30 aa 


SEQ ID NO: 1059 


trp2 segment 10 


90nts 


SEQIDNO: 1060 


Polypeptide encoded by SEQ ID NO: 1059 


30 aa 


SEQIDNO: 1061 


trp2 segment 1 1 


90nts 


SEQIDNO: 1062 


Polypeptide encoded by SEQ ID NO: 1061 


30 aa 


SEQIDNO: 1063 


trp2 segment 12 


90 nts 


SEQIDNO: 1064 


Polypeptide encoded by SEQ ID NO: 1063 


30 aa 


SEQIDNO: 1065 


trp2 segment 13 


90 nts 


SEQ ID NO: 1066 


Polypeptide encoded by SEQ ID NO: 1065 


30 aa 


SEQIDNO: 1067 


trp2 segment 14 


90 nts 


SEQIDNO: 1068 


Polypeptide encoded by SEQ ID NO: 1067 


30 aa 


SEQIDNO: 1069 


trp2 segment 15 


90 nts 


SEQ ID NO: 1070 


Polypeptide encoded by SEQ ID NO: 1069 


30 aa 


SEQIDNO: 1071 


trp2 segment 16 


90 nts 


SEQ ID NO: 1072 


Polypeptide encoded by SEQ ID NO: 1071 


30 aa 




upz segment i / 


C\t\ M 4« 

yu nts 


SEQIDNO: 1074 


Polypeptide encoded by SEQ ID NO: 1073 


30 aa 


SEQIDNO: 1075 


trp2 segment 18 


90 nts 


SEQIDNO: 1076 


Polypeptide encoded by SEQ ID NO: 1075 


30 aa 


SEQIDNO: 1077 


trp2 segment 19 


90 nts | 


SEQIDNO: 1078 


Polypeptide encoded by SEQ ID NO: 1 077 


30 aa 
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0"C/~\ JT\ XT/"V 1 A*7A 

obQ ID NU. W/y 


trpz segment 20 


90 nts 


SEQ ID NO: 1080 


Polypeptide encoded by SEQ ID NO: 1079 


30 aa N 


SEQ ID NO: 1081 


trp2 segment 21 


90 nts 


SEQ ID NO: 1 082 


Polypeptide encoded by SEQ ID NO: 1081 


30 aa 


SEQ ID NO: 1083 


trp2 segment 22 


90 nts 


SEQ ID NO: 1084 


Polypeptide encoded by SEQ ID NO: 1083 


30 aa 


SEQ ID NO: 1085 


trp2 segment 23 


90 nts 


SEQ ID NO: 1086 


Polypeptide encoded by SEQ ID NO: 1085 


30 aa 


SEQ ID NO: 1087 


trp2 segment 24 


90 nts 


SEQ ID NO: 1088 


Polypeptide encoded by SEQ ID NO: 1087 


30 aa 


SEQ ID NO: 1089 


trp2 segment 25 


90 nts 


OTA "WTTX VIA 1 /vrtft 

SEQ ID NO: 1090 


Polypeptide encoded by SEQ ID NO: 1089 


30 aa 


SEQ ID NO: 1091 


trp2 segment 26 


90 nts 


SEQ ID NO: 1 092 


Polypeptide encoded by SEQ ID NO: 1091 


30 aa 


SEQ ID NO: 1093 


trp2 segment 27 


90 nts 


JSEQ ID NU: 1094 


Polypeptide encoded by SEQ ID NO: 1093 


30 aa 


•SEQ ID NO: 1095 


trpz segment 28 


90 nts 


ocQ ID NO: loyo 


roJypeptide encoded by 5>EQ ID NO: 1095 


30 aa 


SEO TD NO- 1097 


tmO cpompnt OQ 
up^ 5C£LL1CI11 


OA *i+r> 

yv nts 


SEQ ID NO: 1098 


Polypeptide encoded by SEQ ID NO: 1097 


30 aa 


SEQ ID NO: 1099 


trp2 segment 30 


90 nts 


SEQ ID NO: 1100 


Polypeptide encoded by SEQ ID NO: 1099 


30 aa 


SEQIDNO: 1101 


trp2 segment 31 


90 nts 


SEQ ID NO: 1102 


Polypeptide encoded by SEQ ID NO: 1 101 


30 aa 
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SEQ ID NO: 1103 


trp2 segment 32 


90 nts 


SEQIDNO: 1104 


Polypeptide encoded by SEQ ID NO: 1 103 


30 aa 


SEQIDNO: 1105 


trp2 segment 33 


90 nts 


SEQ ID NO: 1 106 


Polypeptide encoded by SEQ ID NO: 1 105 


30 aa 


SEQIDNO: 1107 


trp2 segment 34 


84 nts 


SEQIDNO: 1108 


Polypeptide encoded by SEQ ID NO: 1 107 


28 aa 


SEQIDNO: 1109 


MCI R segment 1 


90 nts 


SEQIDNO: 1110 


Polypeptide encoded by SEQ ID NO: 1 109 


30 aa 


SEQIDNO: 1111 


MC1R segment 2 


90 nts 


SEQIDNO: 1112 


Polypeptide encoded by SEQ ID NO: 1111 


30 aa 


SEQIDNO: 1113 


MC1R segment 3 


90 nts 


SEQIDNO: 1114 


Polypeptide encoded by SEQ ID NO: 1113 


30 aa 


SEQIDNO: 1115 


MC1R segment 4 


90 nts 


SEQIDNO: 1116 


Polypeptide encoded by SEQ ID NO: 1115 


30 aa 


SEQIDNO: 1117 


MC1R segment 5 


90 nts 


SEQIDNO: 1118 


Polypeptide encoded by SEQ ID NO: 1117 


30 aa 


SEQIDNO: 1119 


MC1R segment 6 


90 nts 


SEQ ID NO: 1 120 


Polypeptide encoded by SEQ ID NO: 1119 


30 aa 


^FOTHNO- 1191 


iviv^ijv segment / 


yu nts 


SEQIDNO: 1122 


Polypeptide encoded by SEQ ID NO: 1121 


30 aa 


SEQIDNO: 1123 


MC1R segment 8 


90 nts 


SEQIDNO: 1124 


Polypeptide encoded by SEQ ID NO: 1 123 


30 aa 


SEQIDNO: 1125 


MC1R segment 9 


90 nts 


SEQIDNO: 1126 


Polypeptide encoded by SEQ ID NO: 1 125 


30 aa 
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SEQIDNO: 1127 


MC1R segment 10 


90nts 


SEQ ID NO: 1128 


Polypeptide encoded by SEQ ID NO: 1 127 


30 aa 


SEQIDNO: 1129 


MC1R segment 11 


90nts 


SEQIDNO: 1130 


Polypeptide encoded by SEQ ID NO: 1 129 


30 aa 


SEQIDNO: 1131 


MC1R segment 12 


90 nts 


SEQIDNO: 1132 


Polypeptide encoded by SEQ ID NO: 1 131 


30 aa 


SEQIDNO: 1133 


MC1R segment 13 


90 nts 


SEQIDNO: 1134 


Polypeptide encoded by SEQ ID NO: 1 133 


30 aa 


SEQIDNO: 1135 


MC1R segment 14 


90 nts 


SEQIDNO: 1136 


Polypeptide encoded by SEQ ID NO: 1 135 


30 aa 


SEQIDNO: 1137 


MC1R segment 15 


90 nts 


SEQIDNO: 1138 


Polypeptide encoded by SEQ ID NO: 1 137 


30 aa 


SEQIDNO: 1139 


MC1R segment 16 


90 nts 


SEQIDNO: 1140 


Polypeptide encoded by SEQ ID NO: 1 139 


30 aa 


SEQIDNO: 1141 


MC1R segment 17 


90 nts 


SEQIDNO: 1142 


Polypeptide encoded by SEQ ID NO: 1 141 


30 aa 


SEQIDNO: 1143 


MC1R segment 18 


90 nts 


SEQ ID NO: 1 144 


Polypeptide encoded by SEQ ID NO: 1 143 


30 aa 


oliQ ID INU. 1 145 


ML, IK segment I y 


90 nts 


SEQIDNO: 1146 


Polypeptide encoded by SEQ ID NO: 1 145 


30 aa 


SEQIDNO: 1147 


MC1R segment 20 


90 nts 


SEQIDNO: 1148 


Polypeptide encoded by SEQ ID NO: 1 147 


30 aa 


SEQIDNO: 1149 


MC1R segment 21 


63 nts 


SEQIDNO: 1150 


Polypeptide encoded by SEQ ID NO: 1 149 


21 aa 
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SEQIDNO: ll5l 


MUC1F segment 1 


90 nts 


SEQIDNO: 1 1 52 


Polypeptide encoded by SEQ ID NO: 1151 


30 aa 


SEQIDNO: ll 53 


MUC1F segment 2 


90 nts 


SEQIDNO: 1154 


Polypeptide encoded by SEQ ID NO: 1 153 


30 aa 


SEQIDNO: H55 


MUC1F segment 3 


90 nts 


SEQIDNO: H56 


Polypeptide encoded by SEQ ID NO: 1 155 


30 aa 


SEQIDNO: H57 


MUC1F segment 4 


90 nts 


SEQIDNO: H58 


Polypeptide encoded by SEQ ID NO: 1 157 


30 aa 


SEQIDNO: H59 


MUC1F segment 5 


90 nts 


SEQIDNO: H60 


Polypeptide encoded by SEQ ID NO: 1 1 59 


30 aa 


SEQIDNO: ll6l 


MUC1F segment 6 


90 nts 


SEQIDNO: 1162 


Polypeptide encoded by SEQ ID NO: 1 161 


30 aa 


SEQIDNO: H63 


MUC1F segment 7 


90 nts 


SEQIDNO: 1164 


Polypeptide encoded by SEQ ID NO: 1 163 


30 aa 


SEQIDNO: 1165 


MUC1F segment 8 


72 nts 


SEQIDNO: 1166 


Polypeptide encoded by SEQ ID NO: 1 165 


24 aa 


SEQ ID NO: 1167 


MUC1R segment 1 


90 nts 


SEQ ID NO: 1168 


Polypeptide encoded by SEQ ID NO: 1 167 


30 aa 




jvnj^iiv segment z 


yy) nts 


SEQIDNO: 1170 


Polypeptide encoded by SEQ ID NO: 1 169 


30 aa 


SEQIDNO: 1171 


MUC1R segment 3 


90 nts 


SEQIDNO: 1172 


Polypeptide encoded by SEQ ID NO: 1 171 


30 aa 


SEQIDNO: 1173 j 


MUC1R segment 4 


90 nts 


SEQIDNO: 1174 


Polypeptide encoded by SEQ ID NO: 1 173 


30 aa 
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SEQIDNO: 1175 


MUCIR segment 5 


90 nts 


SEQIDNO: 1176 


Polypeptide encoded by SEQ ID NO: 1 175 


30 aa 


SEQIDNO: 1177 


MUCIR segment 6 


90 nts 


SEQIDNO: 1178 


Polypeptide encoded by SEQ ID NO: 1 177 


30 aa 


SEQIDNO: 1179 


MUCIR segment 7 


90 nts 


SEQIDNO: 1180 


Polypeptide encoded by SEQ ID NO: 1 179 


30 aa 


SEQIDNO: 1181 


MUCIR segment 8 


90 nts 


SEQIDNO: 1182 


Polypeptide encoded by SEQ ID NO: l I8l 


30 aa 


SEQIDNO: 1183 


MUCIR segment 9 


90 nts 


SEQIDNO: 1184 


Polypeptide encoded by SEQ ID NO: 1 183 


30 aa 


SEQIDNO: 1185 


MUCIR segment 10 


90 nts 


SEQIDNO: 1186 


Polypeptide encoded by SEQ ID NO: 1 1 85 


30 aa 


SEQIDNO: 1187 


MUCIR segment ll 


90 nts 


SEQIDNO: 1188 


Polypeptide encoded by SEQ ID NO: 1 187 


30 aa 


SEQIDNO: 1189 


MUCIR segment 12 


90 nts 


SEQIDNO: 1190 


Polypeptide encoded by SEQ ID NO: 1 189 


30 aa 


SEQIDNO: 1191 


MUCIR segment 13 


90 nts 


SEQIDNO: 1192 


Polypeptide encoded by SEQ ID NO: 1 191 


30 aa 


CT?r\ m "mo- 1 1 
oHv/ li^ inu. 


MutiK segment 14 


90 nts 


SEQIDNO: 1194 


Polypeptide encoded by SEQ ID NO: 1 193 


30 aa 


SEQIDNO: 1195 


MUCIR segment 15 


90 nts 


SEQIDNO 1196 


Polypeptide encoded by SEQ ID NO: 1 195 


30 aa 


SEQIDNO: 1197 


MUCIR segment 16 


90 nts 


SEQIDNO: 1198 


Polypeptide encoded by SEQ ID NO: 1 197 


30 aa 
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SEQ ID NO: 1199 


MUC1R segment 17 


90nts 


SEQIDNO: 1200 


Polypeptide encoded by SEQ ID NO: 1 199 


30 aa 


SEQDDNO: 1201 


MUC1R segment 18 


90nts 


SEQIDNO: 1202 


Polypeptide encoded by SEQ ID NO: 1201 


30 aa 


SEQ ID NO: 1203 


MUC1R segment 19 


90nts 


SEQIDNO: 1204 


Polypeptide encoded by SEQ ID NO: 1203 


30 aa 


SEQIDNO: 1205 


MUC1R segment 20 


90 nts 


SEQ ID NO: 1206 


Polypeptide encoded by SEQ ID NO: 1205 


30 aa 


SEQIDNO: 1207 


MUC1R segment 21 


48 nts 


SEQ ID NO: 1208 


Polypeptide encoded by SEQ ID NO: 1207 


16 aa 


SEQ ID NO: 1209 


Differentiation Savine 


16638 nts 


SEQIDNO: 1210 


Polypeptide encoded by SEQ ID NO: 1209 


5546 aa 


SEQIDNO: 1211 


BAGE segment 1 


90 nts 


SEQIDNO: 1212 


Polypeptide encoded by SEQ ID NO: 1211 


30 aa 


SEQIDNO: 1213 


BAGE segment 2 


90 nts 


SEQIDNO: 1214 


Polypeptide encoded by SEQ ID NO: 1213 


30 aa 


SEQIDNO: 1215 


BAGE segment 3 


51 nts 


SEQIDNO: 1216 


Polypeptide encoded by SEQ ID NO: 1215 


17 aa 


QT?n TTI "MO- 1017 


vj/\vFxi- 1 segmem i 


90 nts 


SEQIDNO: 1218 


Polypeptide encoded by SEQ ID NO: 1217 


30 aa 


SEQIDNO: 1219 


GAGE-1 segment 2 


90 nts 


SEQIDNO: 1220 


Polypeptide encoded by SEQ ID NO: 1219 


30 aa 


SEQIDNO: 1221 


GAGE-1 segment 3 


90 nts 


SEQIDNO: 1222 


Polypeptide encoded by SEQ ID NO: 1221 


30 aa 
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1 MQlWfCIi® 




I MMQW i 
!! I 

I: 


&e\1 ID NU. IzzJ 


OACrb-i segment 4 


f\f\ A - 

90 nts 


2>fcQ ID NU: 1224 


Polypeptide encoded by SEQ ID NO: 1223 


30 aa 


abQ ID NU: lzz5 


uACrb-1 segment 5 


90 nts 


SEQIDNO: 1226 


Polypeptide encoded by SEQ ID NO: 1225 


30 aa 


SEQIDNO: 1227 


GAGE-1 segment 6 


90 nts 


SEQ ID NO: 1228 


Polypeptide encoded by SEQ ID NO: 1227 


30 aa 


SEQ ID NO: 1229 


GAGE-1 segment 7 


90 nts 


SEQ ID NO: 1230 


Polypeptide encoded by SEQ ID NO: 1229 


30 aa 


SEQIDNO: 1231 


GAGE-1 segment 8 


90 nts 


SEQIDNO: 1232 


Polypeptide encoded by SEQ ID NO: 1231 


30 aa 


SEQ ID NO: 1233 


GAGE-1 segment 9 


66 nts 


CPA TTN X7/^. 1 A 

SEQIDNO: 1234 


Polypeptide encoded by SEQ ID NO: 1233 


22 aa 


SEQ ID NO: 1235 


gpl001n4 segment 1 


90 nts 


bbQ ID NO: 1236 


Polypeptide encoded by SEQ ID NO: 1235 


30 aa 


abvi ID NU: izi / 


j-. 1 f\fW |4 j. ■-■ uui._n.i-i4 O 

gpl(J0ln4 segment z 


90 nts 


abQ ID NU: lzio 


Polypeptide encoded by ISEQ ID NU: 1237 


30 aa 


ID NU. izjy 


gp l Uuin4 segment 3 


+-JC A_ 

75 nts 


oi^l ID INU. 1ZW 


roiypepude encoded by obQ ID NU. lzjy 


oc ^_ 
25 aa 


SEOIDNO- 1241 




OA ntc 

\\f u nis 


SEQIDNO: 1242 


Polypeptide encoded by SEQ ID NO: 1241 


30 aa 


SEQIDNO: 1243 


MAGE-1 segment 2 


90 nts | 


SEQIDNO: 1244 


Polypeptide oicoded by SEQ ID NO: 1243 


30 aa 


SEQIDNO: 1245 


MAGE-1 segment 3 


90 nts 


SEQIDNO: 1246 


Polypeptide encoded by SEQ ID NO: 1245 


30 aa 
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:[ 


i i 


OC\l JJJ INU. 1Z4/ 


MAlzc-l segment 4 


90 nts 


CT7A TT\ XT/^V. 1 1 AO 

obQ ID NU: 1248 


Polypeptide encoded by SEQ ID NO: 1247 


30 aa 


SEQ ID NU: 1249 


MAGE-1 segment 5 


90 nts 


ot?ax ttx \T/\. t *\E£\ 

SEQ ID NO: 1250 


Polypeptide encoded by SEQ ID NO: 1249 


30 aa 


(IRA II v VTA *\ £ % 

SEQ ID NO: 1251 


MAGE-1 segment 6 


90 nts 


SEQ ID NO: 1252 


Polypeptide encoded by SEQ ID NO: 1251 


30 aa 


SEQ ID NO: 1253 


MAGE-1 segment 7 


90 nts 


SEQ ID NO: 1254 


Polypeptide encoded by SEQ ID NO: 1253 


30 aa 


SEQ ID NO: 1255 


MAGE-1 segment 8 


90 nts 


SEQ ID NO: 1256 


Polypeptide encoded by SEQ ID NO: 1255 


30 aa 


SEQ ID NO: 1257 


MAGE-1 segment 9 


90 nts 


SEQ ID NO: 1258 


Polypeptide encoded by SEQ ID NO: 1257 


30 aa 


CCA. TTX XT/"X. 1 ic n 

SEQ ID NO: 1259 


a M A AT? t _ * t r\ 

MAGE-1 segment 10 


90 nts j 


OTJA TTX \T/*\ 1>>/A 

SEQ ID NO: 1260 


Polypeptide encoded by SEQ ID NO: 1259 


30 aa 


OCA TT\ XT A. 10^1 

i>EQ ID NU: 12ol 


Hit A AT? 1 r. r. r. .-. -■ r, . . 4 1 1 

MAGE-1 segment 1 1 


90 nts 


CCA TTX XT A- 

£>EQ ID NU: 1262 


Polypeptide encoded by SEQ ID NO: 1261 


30 aa 


CCA TT\ XT/'V 1 T/TO 

SriQ ID NU: lzoi 


\M A AT? 1 nn n.ui n,.< 11 

MAvjJb-1 segment 12 


90 nts 


PEA TTX XT A. 1 O/^/t 

MiQ ID NU. 1zo4 


T|_ 1. _ _ _«J J _ U.. CCA TT\ XTA. 1 O ZT1 

Folypeptide encoded by bbQ ID NU: 1263 


30 aa 




iYLr\vjiZi- 1 segment 1 j 


yu nts 


SEQ ID NO: 1266 


Polypeptide encoded by SEQ ID NO: 1 265 


30 aa J 


SEQ ID NO: 1267 


MAGE-1 segment 14 


90 nts 


SEQ JD NO: 1268 


Polypeptide encoded by SEQ ID NO: 1267 


30 aa 


SEQ ID NO: 1269 


MAGE-1 segment 15 


90 nts 


SEQ ID NO: 1270 


Polypeptide encoded by SEQ ID NO: 1269 


30 aa 
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SEQ ED NO: 1271 
SEQIDNO: 1272 
SEQ ID NO: 1273 
SEQIDNO: 1274 
SEQ ID NO: 1275 
SEQIDNO: 1276 
SEQIDNO: 1277 
SEQIDNO: 1278 
SEQIDNO: 1279 
SEQIDNO: 1280 
SEQIDNO: 1281 
SEQIDNO: 1282 
SEQ ID NO: 1283 
SEQIDNO: 1284 
SEQIDNO: 1285 
SEQIDNO: 1286 
SEQIDNO: 1287 
SEQIDNO: 1288 
SEQIDNO: 1289 
SEQIDNO: 1290 
SEQIDNO: 1291 
SEQIDNO: 1292 
SEQIDNO: 1293 
SEQIDNO: 1294 



MAGE-1 segment 16 

Polypeptide encoded by SEQ ID NO. 1271 

MAGE-1 segment 17 

Polypeptide encoded by SEQ ID NO: 1273 
MAGE-1 segment 18 

Polypeptide encoded by SEQ ID NO: 1275 
MAGE-1 segment 19 

Polypeptide encoded by SEQ ID NO: 1277 
MAGE-1 segment 20 

Polypeptide encoded by SEQ ID NO: 1279 
MAGE-3 segment 1 

Polypeptide encoded by SEQ ID NO: 1281 
MAGE-3 segment 2 

Polypeptide encoded by SEQ ED NO: 1283 
MAGE-3 segment 3 

Polypeptide encoded by SEQ ID NO: 1285 
MAGE-3 segment 4 

Polypeptide encoded by SEQ ED NO: 1287 
MAGE-3 segment 5 

Polypeptide encoded by SEQ ID NO: 1289 
MAGE-3 segment 6 

Polypeptide encoded by SEQ ID NO: 1291 
MAGE-3 segment 7 

Polypeptide encoded by SEQ ED NO: 1293 



90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
84 nts 
28 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
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SEQIDNO: 1295 
SEQIDNO: 1296 
SEQIDNO: 1297 
SEQ ID NO: 1298 
SEQIDNO: 1299 
SEQIDNO: 1300 
SEQIDNO: 1301 
SEQIDNO: 1302 
SEQIDNO: 1303 
SEQIDNO: 1304 
SEQIDNO: 1305 
SEQIDNO: 1306 
SEQIDNO: 1307 
SEQIDNO: 1308 
SEQIDNO: 1309 
SEQIDNO: 1310 
SEQIDNO: 1311 
SEQIDNO: 1312 
SEQIDNO: 1313 
SEQIDNO: 1314 
SEQIDNO: 1315 
SEQIDNO: 1316 
SEQIDNO: 1317 
SEQIDNO: 1318 



MAGE-3 segment 8 

Polypeptide encoded by SEQ ID NO: 1295 
MAGE-3 segment 9 

Polypeptide encoded by SEQ ID NO: 1297 
MAGE-3 segment 10 

Polypeptide encoded by SEQ ID NO: 1299 
MAGE-3 segment 11 

Polypeptide encoded by SEQ ID NO: 1301 
MAGE-3 segment 12 

Polypeptide encoded by SEQ ID NO: 1303 
MAGE-3 segment 13 

Polypeptide encoded by SEQ ID NO: 1305 
MAGE-3 segment 14 

Polypeptide encoded by SEQ ID NO: 1307 
MAGE-3 segment 15 

Polypeptide encoded by SEQ ID NO: 1309 
MAGE-3 segment 16 

Polypeptide encoded by SEQ ID NO: 1311 
MAGE-3 segment 17 

Polypeptide encoded by SEQ ID NO: 1 3 1 3 
MAGE-3 segment 18 

Polypeptide encoded by SEQ ID NO: 1315 
MAGE-3 segment 19 

Polypeptide encoded by SEQ ID NO: 1317 



90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
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1 i^&gMQZSQ 








iyl/\OjC»-j segment zu 


9U nts 


cpo rn >jn* toon 

OCV^ ID XNU. 1 3ZU 


roiypepude encoded by oEQ ^D NU: 1319 


30 aa 


olil^ JJJ lNU. 1 321 


MAuH-3 segment 21 


54 nts 


oiiQ ID NU. I 322 


Polypeptide encoded by bbQ ID NU: 1321 


18 aa 


CtiA TT\ XW 1 1ll 

ISliv? ID NU: 1323 


Tyr> A "a. jTT? * t 

FRAME segment 1 


90 nts 


aliQ lu NU: 1324 


Polypeptide encoded by SEQ ID NO: 1323 


30 aa 


oJbQ ID NU: 1325 


rKAMb segment 2 


90 nts 


SfcQ ID NU: 1326 


Polypeptide encoded by SEQ ID NO: 1325 


30 aa 


bEQ ID NU: 1327 


PRAME segment 3 


90 nts 


5>bQ ID NU: 1328 


Polypeptide encoded by SEQ ID NO: 1327 


30 aa 


bEQ ID NU: 1329 


PRAME segment 4 


90 nts 


oEQ ID NU: 1330 


Polypeptide encoded by SEQ ID NO: 1329 


30 aa 


CT70 TTI XTO- 1 OO 1 

ID INU. 1331 


rKAMb segment 5 1 


90 nts 


ol^Vc ID XNU. 1332 


rolypepnde encoded by oEQ ID NU: 1331 


30 aa 


oJCl/ JUL* INW. I JJJ 


T>T> A A/TT7 nanmant #C 

rKAMU segment 0 


90 nts 


opn tt> xin- i ha 

OE\£ HJ lS\J. iJjH 


roiypepnae encocea oy oxiy id inu: 1333 


30 aa ! 


ccn rn xiry i 

o-C»\^ HJ iy\J. IjjD 


r ivajvld segment / 


90 nts 




roiypepnae encoaea oy &e\i id imu. 133j 


30 aa 


SEOIDNO* 1337 


PRAME segment 8 


-'v I1U> 


SEQIDNO: 1338 


Polypqitide encoded by SEQ ID NO: 1337 


30 aa 


SEQIDNO: 1339 


PRAME segment 9 


90 nts 


SEQIDNO: 1340 


Polypeptide encoded by SEQ ID NO: 1339 


30 aa 


SEQn>NO:1341 


PRAME segment 10 


90 nts 


SEQIDNO: 1342 


Polypeptide encoded by SEQ ID NO: 1341 


30 aa 
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1 

! JMWMIM 


it 


i i 

t 


UJ lNVJ. I 


rKAMJc. segment 1 1 


y\) nts 


CCA TP* XTW 11 A A 

bhKl ID NU: 1344 


Polypeptide encoded by SEQ ID NO: 1343 


1 /V 

30 aa 


SEQ ID NU: 1345 


FRAME segment 12 


90 nts 


SEQIDNO: 1346 


Polypeptide encoded by SEQ ID NO: 1 345 


30 aa 


SEQIDNO: 1347 


PRAME segment 13 


90 nts 


SEQIDNO: 1 348 


Polypeptide encoded by SEQ ID NO: 1347 


30 aa 


r»T7/^ TTV VTA. 1 "> A rv 

SEQIDNO: 1349 


PRAME segment 14 


90 nts 


SEQIDNO: 1350 


Polypeptide encoded by SEQ ID NO: 1349 


30 aa 


SEQIDNO: 1351 


t*tv A m m a *t ^ 

PRAME segment 15 


90 nts 


SEQ ID NO: 1352 


Polypeptide encoded by SEQ ID NO: 1351 


30 aa 


flPA TT\ "VTA 1 *% 

SEQIDNO: 1353 


TVTV A "A ||' i 

PRAME segment 16- 


90 nts 


OTiA TT*v \TA i «\r i 

SEQIDNO: 1354 


TV 1 ^ * t J j i OTA TTV \TA t *y CI 

Polypeptide encoded by SEQ ID NO: 1353 


30 aa 


SEQ ID NO: 1355 


TVTV A AT? _ - a 1 T 

PRAME segment 17 


90 nts 


SEQIDNO: 1356 


Tk - 1 _ J.J 1 pT?A TTV VTA. nff 

Polypeptide encoded by SEQ ID NO: 1355 


30 aa 


CCA TP* VTA. 1 1 C*7 

bEQ ID NU: 1357 


TVTV A V IP n n r. n i r. .t 4 1 O 

rKAMb segment l o 


90 nts 


OT7A TT\ XT/*\. 1 ICO 

MiQ ID NU: 1358 


Polypeptide encoded by 5>bQ ID NU: 1357 


30 aa I 


oliv 1U NU. 1 jjy 


TVTV A \JTC r>A^mn 1 A 

r KAMc segment i y 


OA w-.*^, 

90 nts 


o-bQ ID NU: 1 JoO 


Foiypeptiae encoded oy oJbQ UJ IMU. iojy 


OA 

30 aa 


oXiV^ JUL/ INU. 1 Jul 


pp AX/TP cpompnt OA 
i IVfVtYLC segment ZU 


OH ntr 

yu nis 


SEQIDNO: 1362 


Polypeptide encoded by SEQ ID NO: 1361 


30 aa 


SEQIDNO: 1363 


PRAME segment 21 


90 nts 


SEQIDNO: 1364 


Polypeptide encoded by SEQ ID NO: 1363 


30 aa 


SEQIDNO: 1365 


PRAME segment 22 


90nts 


SEQIDNO: 1366 


Polypeptide encoded by SEQ ID NO: 1365 


30 aa 
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1 MMiBIEM 


i 
1 


j! IMN&M J 
i! • i 


CCA TFI "Kin* 1 "XfJl 


rKAMxi segment zj 


90 nts 


Z>EQ ID NU. 1 30o 


Polypeptide encoded by SEQ ID NO: 1367 


30 aa 


bbQ AD NO: 1369 


FRAME segment 24 


90 nts 


OTJA TT\ \TA. 1 *>*7A 

SEQIDNO: 1370 


Polypeptide encoded by SEQ ID NO: 1369 


30 aa 


SEQIDNO: 1371 


PRAME segment 25 


90 nts 


SEQIDNO: 1372 


Polypeptide encoded by SEQ ID NO: 1371 


30 aa 


OT7A TTN VIA. 1 

SEQIDNO: 1373 


PRAME segment 26 


90 nts 


SEQIDNO: 1374 


Polypeptide encoded by SEQ ID NO: 1373 


30 aa 


SEQ ID NO: 1375 


PRAME segment 27 


90 nts 


SEQIDNO: 1 376 


Polypeptide encoded by SEQ ID NO: 1375 


30 aa 


SEQIDNO: 1 377 


PRAME segment 28 


90 nts 


SEQ ID NO: 1378 


Polypeptide encoded by SEQ ID NO: 1377 


30 aa 


pT?A TTX \7A. 1 ^»7A 

SEQ ID NO: 1379 


PRAME segment 29 


90 nts 


MiQ ID NO: 1380 


_ 1 - - _ 4 ... J _ J 1 pTl/> ■ fx VTA 1 nn/v 

Polypeptide encoded by SEQ ID NO: 1379 


30 aa 


CT7A TT\ XT A. 1 1Q1 

b.fcQ UJ NU: I Jol 


FRAME segment 30 


90 nts 


CTJA TT\ XT A. 1 1 0O 


Polypeptide encoded by SEQ ID NO: 1381 


30 aa 


cun m xjrv 1 101 
oxiv *D INU. 1 Jo J 


T>T> A \JTD OArmn am4 1 1 

rKAMxi segment J l 


90 nts 




roiypepuae encocea oy oHQ LU inu: lioi 


30 aa 


SEOIDNO- 1385 


PRAME segment 12 


OA rite 

y\j his 


SEQIDNO: 1386 


Polypeptide encoded by SEQ ID NO: 1385 


30 aa 


SEQIDNO: 1387 


PRAME segment 33 


90 nts 


SEQIDNO: 1388 


Polypeptide encoded by SEQ ID NO: 1 387 


30 aa 


SEQIDNO: 1389 


PRAME segment 34 


54 nts 


SEQIDNO: 1390 


Polypeptide encoded by SEQ ID NO: 1389 


18 aa 
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1 8BQUSBIGS& 


ii 


| iMmm i 

j! . ; . . i 




TP PO TXT 7 cpompnt 1 

ixvx zjjnz segment i 


OH ntc 

y\j nis 




DAltmonfi/la „_ „ J „J OT7A TA XTA. 1101 

roiypeptide encoded by SEQ ID NO. 1391 


30 aa 


ccrv ttv xta. 1 


'I'll TV^ 1XTO *■> 

l KrzlNz segment 2 


90 nts 


MivJ 11/ IMU. 1 jy*f 


Polypeptide encoded by SEQ ID NO: 1393 


30 aa 


CTjn TT\ XTA« t 

id no: 139^ 


1KP21N2 segment 3 


OA 

84 nts 


cca tt\ xta. 1 in/: 

SEQ ID NO: 1396 


Polypeptide encoded by SEQ ID NO: 1395 


28 aa 


SEQ ID NO: 1397 


NYNSOla segment I 


90 nts 


SEQIDNO: 1398 


Polypeptide encoded by SEQ ID NO: 1397 


30 aa 


oca tt\ xta. i inn 

SEQIDNO: 1399 


XT\/XTO/"\l », n i-i n in ii i i in < O 

NYNSOla segment 2 


90 nts 


OCA TT\ XTA. 1 A A A 

SEQ ID NO: 1400 


Polypeptide encoded by SEQ ID NO: 1399 


30 aa 


CCA TA XTA. t yim 

SEQ ID NO: 1401 


XTVXTO A 1 n i 1 

NYNSOla segment 3 


90 nts 


PTjrv TT\ XTA. 1 >inO 

SEQ ID NO. 1402 


D/vliiMAM^ilA a*%aa*1a.sJ X«> CCA TA XTA. 1 A A1 

Polypeptide encoded by SEQ ID NO: 1401 


30 aa 




NiiNouia segment 4 


90 nts 


oca rr\ xta. i AAA 


D/\1imM\fi/}A Mi/tAit^ kit CCA I~A XTA. 1 y| A1 

roiypepuue encoded by Scv* ID NO. I4U3 


30 aa 


CCA TT\ XTA- 1 A(\K 


XJVXTO A 1 0 PorrmoMt C 

xm i JNouia segment d 


90 nts 




pAKirvAnfi/lo **r\r-<"w4*v#4 Ki/ CCA TA XTA- 1 /I AC 

roiypepnae encooeu oy lu inlj. 14Uj 


OA A- 

30 aa 


TT* "Kin* 1 A/17 


in i iNovJi a segment o 


AA «« n 

90 nts 


OCA TA XTA. | ,4AO 


roiypepnae encooeu Dy oiiv< JLU rnKJ. 1 wv 


OA AA 

30 aa 


SEO ID NO- 1409 


NYNSOla seement 7 


7v 11 lo 


SEQIDNO: 1410 


Polypeptide encoded by SEQ ID NO: 1409 


30 aa 


SEQIDNO: 1411 


NYNSOla segment 8 


90 nts 


SEQIDNO: 1412 


Polypeptide encoded by SEQ ID NO: 141 1 


30 aa 


SEQIDNO: 1413 


NYNSOla segment 9 


90 nts 


SEQIDNO: 1414 


Polypeptide encoded by SEQ ID NO: 1413 


30 aa | 
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1 M ! QiMM:tfE> 


!j 
ii 


! -.. i 




in Yin la segment iu 


90 nts 


OCA TT* XTW 1 A i*z 

MiQ ID NU. 141 o 


Polypeptide encoded by SEQ ID NO: 1415 


30 aa 


dllQ ID NO. 141 / 


XTV^XTO^^Ia - - - — — T * 

N YNISOIa segment 1 1 


90 nts 


CE A TA XT/"\. 1 41 O 

SEQ ID NO: 1418 


Polypeptide encoded by SEQ ID NO: 1417 


30 aa 


OCA TT*\ VT/\. 1 i i a 

SEQ ID NO: 1419 


VTV7VTOA1 _ . i n 

NYNSOla segment 12 


57 nts 


OT7A TT\ VTA. 1 

SEQ ID NO: 1420 


Polypeptide encoded by SEQ ID NO: 1419 


19 aa 


PUA TA VIA. 1 iOI 

SEQ ID NO: 1421 


NYNSOlb segment I 


90 nts 


OT"»A "TA \TA 1 inn 

SEQ ID NO: 1422 


Polypeptide encoded by SEQ ID NO: 1421 


30 aa 


SEQ ID NO: 1423 


VT\7VTOA1L a, n 

NYNSOlb segment 2 


90 nts 


ftT?A TT\ VTA 1 in 1 

SEQ ID NO: 1424 


Polypeptide encoded by SEQ ID NO: 1423 


30 aa 


OTJA TT\ VTA i in/- 

SEQ ID NO: 1425 


\T\7VTOA1l. a 1 

NYNSOlb segment 3 


90 nts 


riTjA TT"\ \TA - 1 in/ 

SEQ ID NO: 1426 


Polypeptide encoded by SEQ ID NO: 1425 


30 aa 


CCA ta xt/a. i vio'7 
abQ ID NO: 1427 


NYNMJlb segment 4 


51 nts 


OT7A TA XT/A. t itOO 

5>EQ ID NO: 1428 


—«tJJ« „ _ J* — J t _ . OT7/A TTTV VTA T >| n n 

Polypeptide encoded by 5>EQ ID NO: 1427 




o.bQ ID NO: 1429 


LACjJbl segment 1 


90 nts 


PCA TT\ XJA. 1 Aid 

or\l UJ 1NU. 14JU 


T> r> 1-1 r»-> 1 A r» nmnnn'*u4 Un CCA TTHk XT/A. 1 >|OA 

roiypepude encoded by bfcQ ID NO: 1429 


30 aa j 


CCA IT* XJfV 1 A1 1 
oCVc ID INU. 1 4 J 1 


T A /TCI ooftmont O 

wvtjiii segment z 


90 nts 


OT7A m xjfV 1 /fJI 
uJ 1NU. 14JZ 


roiypeptiae encoded by otQ ID NO: 1431 


30 aa 


0£*y XL/ 111/. 1 tJJ 


JLhrvvJJJrl 2>Cj£IIldll J 


7u nis 


SEQ ID NO: 1434 


Polypeptide encoded by SEQ ID NO: 1433 


30 aa 


SEQ ID NO: 1435 


LAGE1 segment 4 


90 nts 


SEQ ID NO: 1436 


Polypeptide encoded by SEQ ID NO: 1435 


30 aa 


SEQ ID NO: 1437 


LAGE1 segment 5 


90 nts 


SEQ ID NO: 1438 


Polypeptide encoded by SEQ ID NO: 1437 


30 aa 
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1 $i^ffl£$CIiB 


1 a§©^sd 

11 

•1 


j ! 


oh\l ID INw. 14Jy 


LAulil segment o 


90nts 


SEQ ID NO: 1440 


Polypeptide encoded by SEQ ID NO: 1439 


30 aa 


SEQ ID NO: 1441 


LAGE1 segment 7 


90 nts 


SEQ ID NO: 1 442 


Polypeptide encoded by SEQ ID NO: 1441 


30 aa 


SEQ ID NO: 1443 


LAGE1 segment 8 


90 nts 


SEQ ID NO: 1444 


Polypeptide encoded by SEQ ID NO: 1443 


30 aa 


SEQ ID NO: 1445 


LAGE1 segment 9 


90 nts 


SEQ ID NO: 1446 


Polypeptide encoded by SEQ ID NO: 1445 


30 aa 


rrv via ■• a a*% 

SEQ ID NO: 1447 


LAGE1 segment 10 


90 nts 


SEQ ID NO: 1448 


Polypeptide encoded by SEQ ID NO: 1447 


30 aa 


SEQ ID NO: 1449 


LAGE1 segment 11 


90 nts 


SEQ ID NO: 1450 


Polypeptide encoded by SEQ ID NO: 1449 


30 aa 


SEQ ID NO: 1451 


T A /-IT* 4 i •» r% 

LAGE1 segment 12 


57 nts 


SEQ ID NO: 1452 


Polypeptide encoded by SEQ ID NO: 1451 


19 aa 


AT?/\ 1 l \ \IA. * 

SEQ ID NO: 1453 


Melanoma cancer specific Savine 


10623 nts 


•SEQ ID NO: 1454 


Polypeptide encoded by SEQ ID NO: 1453 


3541 aa 


SEQ ID NO: 1455 


Figure 16 A1S1 99mer 


99 nts 


bEQ ID NO: 1456 


rigure 16 Alb2 lOOmer 


100 nts 




piguic io ni oj ii/umer 


iuu nts 


SEQ ID NO: 1458 


Figure 16 A1S4 lOOmer 


100 nts 


SEQ ID NO: 1459 


Figure 16 A1S5 lOOmer 


100 nts 


SEQ ID NO. 1460 


Figure 16 AlS6 99mer ; 


99 nts 


SEQIDNO: 1461 


Figure 16 A1S7 97mer 


99 nts 


SEQ ID NO: 1462 


Figure 16 A1S8 lOOmer 


100 nts 
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II immiiM j 


SEQIDNO: 1463 


Figure 16 A1S9 lOOmer 


100 nts 


SEQIDNO: 1464 


Figure 16AlS10 75mer 


76nts 


SEQ ID NO: 1465 


Figure 16AlF20mer 


20 nts 


SEQIDNO: 1466 


Figure 16 AIR 20mer 


20 nts 


SEQ ID NO: 1467 


Amino acid sequence of immunostimulatory 


16 aa 




domain of an invasin protein from Yersinia spp. 





WO 01/090197 



PCT/AU01/00622 



-79- 

DETAILED DESCRIPTION OF THE INVENTION 

L Definitions 

The articles "a " and "an " are used herein to refer to one or to more than one {i.e., 
to at least one) of the grammatical object of the article. By way of example, "an element" 
5 means one element or more than one element. 

As used herein, the term "about*' refers to a quantity, level, value, dimension, 
size, or amount that varies by as much as 30%, preferably by as much as 20%, and more 
preferably by as much as 10% to a reference quantity, level, value, dimension, size, or 
amount. 

10 By "antigen-binding molecule" is meant a molecule that has binding affinity for a 

target antigen. It will be understood that this term extends to immunoglobulins, 
immunoglobulin fragments and non-immunoglobulin derived protein frameworks that 
exhibit antigen-binding activity. 

The term "clade " as used herein refers to a hypothetical species of an organism 
15 and its descendants or a monophyletic group of organisms. Clades carry a definition, based 
on ancestry, and a diagnosis, based on synapomoiphies. It should be noted that diagnoses 
of clades could change while definitions do not 

Throughout this specification, unless the context requires otherwise, the words 
"comprise"* "comprises" and "comprising" will be understood to imply the inclusion of a 
20 stated step or element or group of steps or elements but not the exclusion of any other step 
or element or group of steps or elements. 

By "expression vector" is meant any autonomous genetic element capable of 
directing the synthesis of a protein encoded by the vector. Such expression vectors are 
known by practitioners in die art. 

25 As used herein, the term "function" refers to a biological, enzymatic, or 

therapeutic function. 
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"Homology'* refers to the percentage number of amino acids that are identical or 
constitute conservative substitutions as defined in Table B infra. Homology may be 
determined using sequence comparison programs such as GAP (Deveraux et ah 1984, 
Nucleic Acids Research Yl 9 387-395). In this way, sequences of a similar or substantially 
5 different length to those cited herein might be compared by insertion of gaps into the 
alignment, such gaps being determined, for example, by the comparison algorithm used by 
GAP. 

To enhance an immune response ("immunoenhancement"), as is well-known in 
the ait, means to increase an animal's capacity to respond to foreign or disease-specific 

10 antigens (e.g., cancer antigens) those cells primed to attack such antigens are increased 
in number, activity, and ability to detect and destroy the those antigens. Strength of 
immune response is measured by standard tests including: direct measurement of 
peripheral blood lymphocytes by means known to the art; natural killer cell cytotoxicity 
assays (see, e.g., Provinciali M. et al (1992, J. Immunol Meth. 155: 19-24), cell 

15 proliferation assays (see, e.g., Vollenweider, I. and Groseurth, P. J. (1992, J. Immunol. 
Meth. 149: 133-135), immunoassays of immune cells and subsets (see, e.g. y Loeffler, D. 
A., et al (1992, Cytom. 13: 169-174); Rivoltini, L., et al. (1992, Can. Immunol 
Immunother. 34: 241-251); or skin tests for cell-mediated immunity (see, e.g. 9 Chang, A. 
E. et al (1993, Cancer Res. 53: 1043-1050). Any statistically significant increase in 

20 strength of immune response as measured by the foregoing tests is considered "enhanced 
immune response'* "immunoenhancement" or "immunopotentiation" as used herein. 
Enhanced immune response is also indicated by physical manifestations such as fever and 
inflammation, as well as healing of systemic and local infections, and reduction of 
symptoms in disease, Le., decrease in tumour size, alleviation of symptoms of a disease or 

25 condition including, but not restricted to, leprosy, tuberculosis, malaria, naphthous ulcers, 
herpetic and papillomatous warts, gingivitis, artherosclerosis, the concomitants of AIDS 
such as Kaposi's sarcoma, bronchial infections, and the like. Such physical manifestations 
also define "enhanced immune response" "immunoenhancement" or 
"immunopotentiation " as used herein. 

30 Reference herein to "immuno-interactive" includes reference to any interaction, 

reaction, or other form of association between molecules and in particular where one of the 
molecules is, or mimics, a component of the immune system. 
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By "isolated" is meant material that is substantially or essentially free from 
components that normally accompany it in its native state. 

By "modulating" is meant increasing or decreasing, either directly or indirectly, 
an immune response against a target antigen of a member selected from the group 
5 consisting of a cancer and an organism, preferably a pathogenic organism. 

By "natural gene" is meant a gene that naturally encodes a protein. 

The term "natural polypeptide" as used herein refers to a polypeptide that exists 
in nature. 

By "obtained from " is meant that a sample such as, for example, a polynucleotide 
10 extract or polypeptide extract is isolated from, or derived from, a particular source of the 
host. For example, the extract can be obtained from a tissue or a biological fluid isolated 
directly from the host. 

The term "oligonucleotide" as used herein refers to a polymer composed of a 
multiplicity of nucleotide residues (deoxyribonucleotides or ribonucleotides, or related 

15 structural variants or synthetic analogues thereof) linked via phosphodi ester bonds (or 
related structural variants or synthetic analogues thereof). Thus, while the term 
"oligonucleotide" typically refers to a nucleotide polymer in which the nucleotide residues 
and linkages between them are naturally occurring, it will be understood that the term also 
includes within its scope various analogues including, but not restricted to, peptide nucleic 

20 acids (PNAs), phosphoramidates, phosphorothioates, methyl phosphonates, 2-O-methyl 
ribonucleic acids, and the like. The exact size of the molecule can vary depending on the 
particular application. An oligonucleotide is typically rather short in length, generally from 
about 10 to 30 nucleotide residues, but the term can refer to molecules of any length, 
although the term **polynucleotide" or Nucleic acid" is typically used for large 

25 oligonucleotides. 

By "operably linked" is meant that transcriptional and translational regulatory 
polynucleotides are positioned relative to a polypeptide-encoding polynucleotide in such a 
manner that the polynucleotide is transcribed and the polypeptide is translated. 
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The term "parent polypeptide*' as used herein typically refers to a polypeptide 
encoded by a natural gene. However, it is possible that the parent polypeptide corresponds 
to a protein that is not naturally-occurring but has been engineered using recombinant 
techniques. In this instance, a polynucleotide encoding the parent polypeptide may 
5 comprise different but synonymous codons relative to a natural gene encoding the same 
polypeptide. Alternatively, the parent polypeptide may not correspond to a natural 
polypeptide sequence. For example, the parent polypeptide may comprise one or more 
consensus sequences common to a plurality of polypeptides. 

The term "patient 9 refers to patients of human or other mammal and includes any 
10 individual it is desired to examine or treat using the methods of the invention. However, it 
will be understood that "patient 9 does not imply that symptoms are present. Suitable 
mammals that fall within the scope of the invention include, but are not restricted to, 
primates, livestock animals (e.g., sheep, cows, horses, donkeys, pigs), laboratory test 
animals (e.g. y rabbits, mice, rats, guinea pigs, hamsters), companion animals (e.g., cats, 
15 dogs) and captive wild animals (e.g y foxes, deer, dingoes). 

By "pharmaceutically-acceptable carrier" is meant a solid or liquid filler, diluent 
or encapsulating substance that can be safely used in topical or systemic administration to a 
mammal. 

"Polypeptide", "peptide" and "protein" are used interchangeably herein to refer to 
20 a polymer of amino acid residues and to variants and synthetic analogues of the same. 
Thus, these terms apply to amino acid polymers in which one or more amino acid residues 
is a synthetic non-naturally occurring amino acid, such as a chemical analogue of a 
corresponding naturally occurring amino acid, as well as to naturally-occurring amino acid 
polymers. 

25 The term "polynucleotide" or "nucleic acid 9 as used herein designates mRNA, 

RNA, cRNA, cDNA or DNA. The term typically refers to oligonucleotides greater than 30 
nucleotide residues in length. 

By "primer^ is meant an oligonucleotide which, when paired with a strand of 
DNA, is capable of initiating the synthesis of a primer extension product in the presence of 
30 a suitable polymerising agent The primer is preferably single-stranded for maximum 
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efficiency in amplification but can alternatively be double-stranded. A primer must be 
sufficiently long to prime the synthesis of extension products in the presence of the 
polymerisation agent. The length of the primer depends on many factors, including 
application, temperature to be employed, template reaction conditions, other reagents, and 
5 source of primers. For example, depending on the complexity of the target sequence, the 
oligonucleotide primer typically contains 15 to 35 or more nucleotide residues, although it 
can contain fewer nucleotide residues. Primers can be large polynucleotides, such as from 
about 35 nucleotides to several kilobases or more. Primers can be selected to be 
"substantially complementary*' to the sequence on the template to which it is designed to 

10 hybridise and serve as a site for the initiation of synthesis. By "substantially 
complementary", it is meant that the primer is sufficiently complementary to hybridise 
with a target polynucleotide. Preferably, the primer contains no mismatches with the 
template to which it is designed to hybridise but this is not essential. For example, non- 
complementary nucleotide residues can be attached to the 5* end of the primer, with the 

15 remainder of the primer sequence being complementary to the template. Alternatively, 
non-complementary nucleotide residues or a stretch of non-complementary nucleotide 
residues can be interspersed into a primer, provided that the primer sequence has sufficient 
complementarity with the sequence of the template to hybridise therewith and thereby form 
a template for synthesis of the extension product of the primer. 

20 "Probe'* refers to a molecule that binds to a specific sequence or sub-sequence or 

other moiety of another molecule. Unless otherwise indicated, the term "probe" typically 
refers to a polynucleotide probe that binds to another polynucleotide, often called the 
"target polynucleotide", through complementary base pairing. Probes can bind target 
polynucleotides lacking complete sequence complementarity with the probe, depending on 

25 the stringency of the hybridisation conditions. Probes can be labelled directly or indirectly. 

By "recombinant polypeptide" is meant a polypeptide made using recombinant 
techniques, i.e., through the expression of a recombinant or synthetic polynucleotide. 

Terms used to describe sequence relationships between two or more 
polynucleotides or polypeptides include "reference sequence", "comparison window", 
30 "sequence identity", "percentage of sequence identity" and "substantial identity". A 
"reference sequence" is at least 12 but frequently 15 to 18 and often at least 25 monomer 
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units, inclusive of nucleotides and amino acid residues, in length. Because two 
polynucleotides may each comprise (1) a sequence (i.e., only a portion of the complete 
polynucleotide sequence) that is similar between the two polynucleotides, and (2) a 
sequence that is divergent between the two polynucleotides, sequence comparisons 
5 between two (or more) polynucleotides are typically performed by comparing sequences of 
the two polynucleotides over a "comparison window" to identify and compare local 
regions of sequence similarity. A "comparison window" refers to a conceptual segment of 
at least 50 contiguous positions, usually about 50 to about 100, more usually about 100 to 
about 150 in which a sequence is compared to a reference sequence of the same number of 

10 contiguous positions after the two sequences are optimally aligned. The comparison 
window may comprise additions or deletions (i.e., gaps) of about 20% or less as compared 
to the reference sequence (which does not comprise additions or deletions) for optimal 
alignment of the two sequences. Optimal alignment of sequences for aligning a comparison 
window may be conducted by computerised implementations of algorithms (GAP, 

15 BESTFTT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 
7.0, Genetics Computer Group, 575 Science Drive Madison, WI, USA) or by inspection 
and the best alignment (i.e., resulting in the highest percentage homology over the 
comparison window) generated by any of the various methods selected. Reference also 
may be made to the BLAST family of programs as for example disclosed by Altschul et 

20 a/., 1997, Nucl. Acids Res. 25:3389. A detailed discussion of sequence analysis can be 
found in Unit 19.3 of Ausubel et aL, "Current Protocols in Molecular Biology", John 
Wiley & Sons Inc, 1994-1998, Chapter 15. 

The term "sequence identity" as used herein refers to the extent that sequences 
are identical on a nucleotide-by-nucleotide basis or an amino acid-by-amino acid basis 

25 over a window of comparison. Thus, a "percentage of sequence identity" is calculated by 
comparing two optimally aligned sequences over the window of comparison, determining 
the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, I) or the 
identical amino acid residue (eg., Ala, Pro, Ser, Thr, Gly, Val, Leu, He, Phe, Tyr, Tip, Lys, 
Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number 

30 of matched positions, dividing the number of matched positions by the total number of 
positions in the window of comparison (i.e. 9 the window size), and multiplying the result 
by 100 to yield the percentage of sequence identity. For the purposes of the present 
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invention, "sequence identity" will be understood to mean the "match percentage" 
calculated by the DNASIS computer program (Version 2.5 for windows; available from 
Hitachi Software engineering Co., Ltd., South San Francisco, California, USA) using 
standard defaults as used in the reference manual accompanying the software. 

5 The term "synthetic polynucleotide" as used herein refers to a polynucleotide 

formed in vitro by the manipulation of a polynucleotide into a form not normally found in 
nature. For example, the synthetic polynucleotide can be in the form of an expression 
vector. Generally, such expression vectors include transcriptional and translational 
regulatory polynucleotide operably linked to the polynucleotide. 

10 The term "synonymous codon " as used herein refers to a codon having a different 

nucleotide sequence than another codon but encoding the same amino acid as that other 
codon. 

By "translational efficiency" is meant the efficiency of a cell's protein synthesis 
machinery to incorporate the amino acid encoded by a codon into a nascent polypeptide 
15 chain. This efficiency can be evidenced, for example, by the rate at which the cell is able to 
synthesise the polypeptide from an RNA template comprising the codon, or by the amount 
of the polypeptide synthesised from such a template. 

By "vector" is meant a polynucleotide molecule, preferably a DNA molecule 
derived, for example, from a plasmid, bacteriophage, yeast or virus, into which a 

20 polynucleotide can be inserted or cloned. A vector preferably contains one or more unique 
restriction sites and can be capable of autonomous replication in a defined host cell 
including a target cell or tissue or a progenitor cell or tissue thereof, or be integrable with 
the genome of the defined host such that the cloned sequence is reproducible. Accordingly, 
the vector can be an autonomously replicating vector, Le. 9 a vector that exists as an 

25 extrachromosomal entity, the replication of which is independent of chromosomal 
replication, e.g., a linear or closed circular plasmid, an extrachromosomal element, a 
minichromosome, or an artificial chromosome. The vector can contain any means for 
assuring self-replication. Alternatively, the vector can be one which, when introduced into 
the host cell, is integrated into the genome and replicated together with the chromosome(s) 

30 into which it has been integrated. A vector system can comprise a single vector or plasmid, 
two or more vectors or plasmids, which together contain the total DNA to be introduced 
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into the genome of the host cell, or a transposon. The choice of the vector will typically 
depend on the compatibility of the vector with the host cell into which the vector is to be 
introduced In the present case, the vector is preferably a viral or viral-derived vector, 
which is operably functional in animal and preferably mammalian cells. Such vector may 
5 be derived from a poxvirus, an adenovirus or yeast. The vector can also include a selection 
marker such as an antibiotic resistance gene that can be used for selection of suitable 
transform ants. Examples of such resistance genes are known to those of skill in the art and 
include the nptU gene that confers resistance to the antibiotics kanamycin and G418 
(Geneticin®) and the hph gene which confers resistance to the antibiotic hygromycin B. 
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2. Synthetic polypeptides 

The inventors have surprisingly discovered that the structure of a parent 
polypeptide can be disrupted sufficiently to impede, abrogate or otherwise alter at least one 
function of the parent polypeptide, while simultaneously minimising the destruction of 
5 potentially useful epitopes that are present in the parent polypeptide, by fusing, coupling or 
otherwise linking together different segments of the parent polypeptide in a different 
relationship relative to their linkage in the parent polypeptide. As a result of this change in 
relationship, the sequence of the linked segments in the resulting synthetic polypeptide is 
different to a sequence contained within the parent polypeptide. The synthetic polypeptides 
10 of the invention are useful as immunopotentiating agents, and are referred to elsewhere in 
the specification as scrambled antigen vaccines, super attenuated vaccines or "Savines". 

Thus, the invention broadly resides in a synthetic polypeptide comprising a 
plurality of different segments of at least one parent polypeptide, wherein said segments 
are linked together in a different relationship relative to their linkage in the at least one 
1 5 parent polypeptide. 

It is preferable but not essential that the segments in said synthetic polypeptide are 
linked sequentially in a different order or arrangement relative to that of corresponding 
segments in said at least one parent polypeptide. For example, in the case of a parent 
polypeptide that comprises three contiguous or overlapping segments A-B-C-D, these 

20 segments may be linked in 23 other possible orders to form a synthetic polypeptide. These 
orders may be selected from the group consisting of: A-B-D-C, A-C-B-D, A-C-D-B, A-D- 
B-C, A-D-C-B, B-A-C-D, B-A-D-C, B-C-A-D, B-C-D-A, B-D-A-C, B-D-C-A, C-A-B-D, 
C-A-D-B, C-B-A-D, C-B-D-A, C-D-A-B, C-D-B-A, D-A-B-C, D-A-C-B, D-B-A-C, D-B- 
C-A, D-C-A-B, and D-C-B-A. Although the rearrangement of the segments is preferably 

25 random, it is especially preferable to exclude or otherwise minimise rearrangements that 
result in complete or partial reassembly of the parent sequence (e.g., ADBC, BACD, 
DABC). It will be appreciated, however, that the probability of such complete or partial 
reassembly diminishes as the number of segments for rearrangement increases. 

The order of the segments is suitably shuffled, reordered or otherwise rearranged 
30 relative to the order in which they exist in the parent polypeptide so that the structure of the 
polypeptide is disrupted sufficiently to impede, abrogate or otherwise alter at least one 
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fiinction associated with the parent polypeptide. Preferably, the segments of the parent 
polypeptide are randomly rearranged in the synthetic polypeptide. 

The parent polypeptide is suitably a polypeptide that is associated with a disease 
or condition. For example, the parent polypeptide may be a polypeptide expressed by a 
5 pathogenic organism or a cancer. Alternatively, the parent polypeptide can be a self 
peptide related to an autoimmune disease including, but are not limited to, diseases such as 
diabetes (e.g 9 juvenile diabetes), multiple sclerosis, rheumatoid arthritis, myasthenia 
gravis, atopic dermatitis, and psoriasis and ankylosing spondylitis. Accordingly, the 
synthetic molecules of the present invention may also have utility for the induction of 

10 tolerance in a subject afflicted with an autoimmune disease or condition or with an allergy 
or other condition to which tolerance is desired. For example tolerance may be induced by 
contacting an immature dendritic cell of the individual to be treated with a synthetic 
polypeptide of the invention or by expressing in an immature dendritic cell a synthetic 
polynucleotide of the invention. Tolerance may also be induced against antigens causing 

15 allergic responses (e.g, asthma, hay fever). In this case, the parent polypeptide is suitably 
an allergenic protein including, but not restricted to, house-dust-mite allergenic proteins as 
for example described by Thomas and Smith (1998, Allergy, 53(9): 821-832). 

The pathogenic organism includes, but is not restricted to, yeast, a virus, a 
bacterium, and a parasite. Any natural host of the pathogenic organism is contemplated by 

20 the present invention and includes, but is not limited to, mammals, avians and fish. In a 
preferred embodiment, the pathogenic organism is a virus, which may be an RNA virus or 
a DNA virus. Preferably, the RNA virus is Human Immunodeficiency Virus (HIV), 
Poliovirus, and Influenza virus, Rous sarcoma virus, or a Flavi virus such as Japanese 
encephalitis virus. In a preferred embodiment, the RNA virus is a Hepatitis virus including, 

25 but not limited to, Hepatitis strains A, B and C. Suitably, the DNA virus is a Herpesvirus 
including, but not limited to, Herpes simplex virus, Epstein-Barr virus, Cytomegalovirus 
and Parvovirus. In a preferred embodiment, the virus is HIV and the parent polypeptide is 
suitably selected from env, gag, pol, vif, vpr, tat, rev, vpu and nef, or combination thereof. 
In an alternate preferred embodiment, the virus is Hepatitis CI a virus and the parent 

30 polypeptide is the Hepatitis CI a virus polyprotein. 
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In another embodiment, the pathogenic organism is a bacterium, which includes, 
but is not restricted to, Neisseria species, Meningococcal species, Haemophilus species 
Salmonella species, Streptococcal species, Legionella species and Mycobacterium species. 

In yet another embodiment, the pathogenic organism is a parasite, which includes, 
5 but is not restricted to, Plasmodium species, Schistosoma species, Leishmania species, 
Trypanosoma species, Toxoplasma species and Giardia species. 

Any cancer or tumour is contemplated by the present invention. For example, the 
cancer or tumour includes, but is not restricted to, melanoma, lung cancer, breast cancer, 
cervical cancer, prostate cancer, colon cancer, pancreatic cancer, stomach cancer, bladder 

10 cancer, kidney cancer, post transplant lymphoproliferative disease (PTLD), Hodgkin's 
Lymphoma and the like. Preferably, the cancer or tumour relates to melanoma. In a 
preferred embodiment of this type, the parent polypeptide is a melanocyte differentiation 
antigen which is suitably selected from gplOO, MART, TRIM, Tyros, TRP2, MC1R, 
MUC1F, MUC1R or a combination thereof. In an alternate preferred embodiment of this 

15 type, the parent polypeptide is a melanoma-specific antigen which is suitably selected from 
BAGE, GAGE-1, gpl00In4, MAGE-1, MAGE-3, PRAME, TRP2IN2, NYNSOla, 
NYNSOlb, LAGE1 or a combination thereof. 

In a preferred embodiment, the segments are selected on the basis of size. A 
segment according to the invention may be of any suitable size that can be utilised to elicit 

20 an immune response against an antigen encoded by the parent polypeptide. A number of 
factors can influence the choice of segment size. For example, the size of a segment should 
be preferably chosen such that it includes, or corresponds to the size of, T cell epitopes and 
their processing requirement. Practitioners in the art will recognise that class I-restricted T 
cell epitopes can be between 8 and 10 amino acids in length and if placed next to unnatural 

25 flanking residues, such epitopes can generally require 2 to 3 natural flanking amino acids 
to ensure that they are efficiently processed and presented. Class Il-restricted T cell 
epitopes can range between 12 and 25 amino acids in length and may not require natural 
flanking residues for efficient proteolytic processing although it is believed that natural 
flanking residues may play a role. Another important feature of class Il-restricted epitopes 

30 is that they generally contain a core of 9-10 amino acids in the middle which bind 
specifically to class II MHC molecules with flanking sequences either side of this core 
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stabilising binding by associating with conserved structures on either side of class II MHC 
antigens in a sequence independent manner (Brown et al. 9 1993). Thus the functional 
region of class II-restricted epitopes is typically less than 15 amino acids long. The size of 
linear B cell epitopes and the factors effecting their processing, like class II-restricted 
5 epitopes, are quite variable although such epitopes are frequently smaller in size than 15 
amino acids. From the foregoing, it is preferable, but not essential, that the size of the 
segment is at least 4 amino acids, preferably at least 7 amino acids, Ynore preferably at least 
12 amino acids, more preferably at least 20 amino acids and more preferably at least 30 
amino acids. Suitably, the size of the segment is less than 2000 amino acids, more 

10 preferably less than 1000 amino acids, more preferably less than 500 amino acids, more 
preferably less than 200 amino acids, more preferably less than 100 amino acids, more 
preferably less than 80 amino acids and even more preferably less than 60 amino acids and 
still even more preferably less than 40 amino acids. In this regard, it is preferable that the 
size of the segments is as small as possible so that the synthetic polypeptide adopts a 

15 functionally different structure relative to the structure of the parent polypeptide. It is also 
preferable that the size of the segments is large enough to minimise loss of T cell epitopes. 
In an especially preferred embodiment, the size of the segment is about 30 amino acids. 

An optional spacer may be utilised to space adjacent segments relative to each 
other. Accordingly, an optional spacer may be interposed between some or all of the 

20 segments. The spacer suitably alters proteolytic processing and/or presentation of adjacent 
segment(s). In a preferred embodiment of this type, the spacer promotes or otherwise 
enhances proteolytic processing and/or presentation of adjacent segment(s). Preferably, the 
spacer comprises at least one amino acid. The at least one amino acid is suitably a neutral 
amino acid. The neutral amino acid is preferably alanine. Alternatively, the at least one 

25 amino acid is cysteine. 

In a preferred embodiment, segments are selected such that they have partial 
sequence identity or homology with one or more other segments. Suitably, at one or both 
ends of a respective segment there is comprised at least 4 contiguous amino acids, 
preferably at least 7 contiguous amino acids, more preferably at least 10 contiguous amino 
30 acids, more preferably at least 15 contiguous amino acids and even more preferably at least 
20 contiguous amino acids that are identical to, or homologous with, an amino acid 
sequence contained within one or more other of said segments. Preferably, at the or each 
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end of a respective segment there is comprised less than 500 contiguous amino acids, more 
preferably less than 200 contiguous amino acids, more preferably less than 100 contiguous 
amino acids, more preferably less than 50 contiguous amino acids, more preferably less 
than 40 contiguous amino acids, and even more preferably less than 30 contiguous amino 
5 acids that are identical to, or homologous with, an amino acid sequence contained within 
one or more other of said segments. Such sequence overlap (also referred to elsewhere in 
the specification as "overlapping fragments" or "overlapping segments") is preferable to 
ensure potential epitopes at segment boundaries are not lost and to ensure that epitopes at 
or near segment boundaries are processed efficiently if placed beside or near amino acids 
10 that inhibit processing. Preferably, the segment size is about twice the size of the overlap. 

In a preferred embodiment, when segments have partial sequence homology 
therebetween, the homologous sequences suitably comprise conserved and/or non- 
conserved amino acid differences. Exemplary conservative substitutions are listed in the 
following table. 

15 TABLES 



Ala 


Ser 


Arg 


Lys 


Asn 


Gin, His 


Asp 


Glu 


Cys 


Ser 


Gin 


Asn 


Glu 


Asp 


Gly 


Pro 


His 


Asn, Gin 


De 


Leu,Val 


Leu 


ne,Val 
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Arg, Gin, Glu 

Leu, He, 
Met, Leu, Tyr 
Thr 
Ser 
Tyr 
Trp, Phe 
He, Leu 



10 



Conserved or non-conserved differences may correspond to polymorphisms in 
corresponding parent polypeptides. Polymorphic polypeptides are expressed by various 
pathogenic organisms and cancers. For example, the polymorphic polypeptides may be 
5 expressed by different viral strains or clades or by cancers in different individuals. 

Sequence overlap between respective segments is preferable to minimise 
destruction of any epitope sequences that may result from any shuffling or rearrangement 
of the segments relative to their existing order in the parent polypeptide. If overlapping 
segments as described above are employed to form a synthetic polypeptide, it may not be 
necessary to change the order in which those segments are linked together relative to the 
order in which corresponding segments are normally present in the parent polypeptide. In 
this regard, such overlapping segments when linked together in the synthetic polypeptide 
can adopt a different structure relative to the structure of the parent polypeptide, wherein 
the different structure does not provide for one or more functions associated with the 
15 parent polypeptide. For example, in the case of four segments A-B-C-D each spanning 30 
contiguous amino acids of the parent polypeptide and having a 10-amino acid overlapping 
sequence with one or more adjacent segments, the synthetic polypeptide will have 
duplicated 10-amino acid sequences bridging segments A-B, B-C and C-D. The presence 
of these duplicated sequences may be sufficient to render a different structure and to 
20 abrogate or alter function relative to the parent polypeptide. 
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In a preferred embodiment, segment size is about 30 amino acids and sequence 
overlap at one or both ends of a respective segment is about 15 amino acids. However, it 
will be understood that other suitable segment sizes and sequence overlap sizes are 
contemplated by the present invention, which can be readily ascertained by persons of skill 
5 in the art. 

It is preferable but not necessary to utilise all the segments of the parent 
polypeptide in the construction of the synthetic polypeptide. Suitably, at least 30%, 
preferably at least 40%, more preferably at least 50%, even more preferably at least 60%, 
even more preferably at least 70%, even more preferably at least 80% and still even more 

10 preferably at least 90% of the parent polypeptide sequence is used in the construction of 
the synthetic polypeptide. However, it will be understood that the more sequence 
information from a parent polypeptide that is utilised to construct the synthetic 
polypeptide, the greater the population coverage will be of the synthetic polypeptide as an 
immunogen. Preferably, no sequence information from the parent polypeptide is excluded 

15 (eg., because of an apparent lack of immunological epitopes). 

Persons of skill in the ait will appreciate that when preparing a synthetic 
polypeptide against a pathogenic organism (e.g., a virus) or a cancer, it may be preferable 
to use sequence information from a plurality of different polypeptides expressed by the 
organism or the cancer. Accordingly, in a preferred embodiment, segments from a plurality 

20 of different polypeptides are linked together to form a synthetic polypeptide according to 
the invention. It is preferable in this respect to utilise as many parent polypeptides as 
possible from, or in relation to, a particular source in the construction of the synthetic 
polypeptide. The source of parent polypeptides includes, but is not limited to, a pathogenic 
organism and a cancer. Suitably, at least about 30%, preferably at least 40%, more 

25 preferably at least 50%, even more preferably at least 60%, even more preferably at least 
70%, even more preferably at least 80% and still even more preferably at least 90% of the 
parent polypeptides expressed by the source is used in the construction of the synthetic 
polypeptide. Preferably, parent polypeptides from a virus include, but are not restricted to, 
latent polypeptides, regulatory polypeptides or polypeptides expressed early during their 

30 replication cycle. Suitably, parent polypeptides from a parasite or bacterium include, but 
are not restricted to, secretory polypeptides and polypeptides expressed on the surface of 
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the parasite or bacteria. It is preferred that parent polypeptides from a cancer or tumour are 
cancer specific polypeptides. 

Suitably, hypervariable sequences within the parent polypeptide are excluded 
from the construction of the synthetic polypeptide. 

5 The synthetic polypeptides of the inventions may be prepared by any suitable 

procedure known to those of skill in the art. For example, the polypeptide may be 
synthesised using solution synthesis or solid phase synthesis as described, for example, in 
Chapter 9 of Atherton and Shephard (1989, Solid Phase Peptide Synthesis: A Practical 
Approach. IRL Press, Oxford) and in Roberge et al (1995, Science 269: 202). Syntheses 
10 may employ, for example, either /-butyloxycarbonyl (/-Boc) or 9- 
fluorenylmethyloxycarbonyl (Fmoc) chemistries (see Chapter 9.1, of Coligan et al y 
CURRENT PROTOCOLS IN PROTEIN SCIENCE, John Wiley & Sons, Inc. 1995-1997; 
Stewart and Young, 1984, Solid Phase Peptide Synthesis, 2nd ed. Pierce Chemical Co., 
Rockford, 111; and Atherton and Shephard, supra). 

15 Alternatively, the polypeptides may be prepared by a procedure including the 

steps of 

(a) preparing a synthetic construct including a synthetic polynucleotide encoding 
a synthetic polypeptide wherein said synthetic polynucleotide is operably linked to a 
regulatory polynucleotide, wherein said synthetic polypeptide comprises a plurality of 

20 different segments of a parent polypeptide, wherein said segments are linked together 
in a different relationship relative to their linkage in the parent polypeptide; 

(b) introducing the synthetic construct into a suitable host cell; 

(c) culturing the host cell to express the synthetic polypeptide from said synthetic 
construct; and 

25 (d) isolating the synthetic polypeptide. 

The synthetic construct is preferably in the form of an expression vector. For 
example, the expression vector can be a self-replicating extra-chromosomal vector such as 
a plasmid, or a vector that integrates into a host genome. Typically, the regulatory 
polynucleotide may include, but is not limited to, promoter sequences, leader or signal 
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sequences, ribosomal binding sites, transcriptional start and stop sequences, translational 
start and termination sequences, and enhancer or activator sequences. Constitutive or 
inducible promoters as known in the art are contemplated by the invention. The promoters 
may be either naturally occurring promoters, or hybrid promoters that combine elements of 
5 more than one promoter. The regulatory polynucleotide will generally be appropriate for 
the host cell used for expression. Numerous types of appropriate expression vectors and 
suitable regulatory polynucleotides are known in the art for a variety of host cells. 

In a preferred embodiment, the expression vector contains a selectable marker 
gene to allow the selection of transformed host cells. Selection genes are well known in the 
10 art and will vary with the host cell used. 

The expression vector may also include a fusion partner (typically provided by the 
expression vector) so that the synthetic polypeptide of the invention is expressed as a 
fusion polypeptide with said fusion partner. The main advantage of fusion partners is that 
they assist identification and/or purification of said fusion polypeptide. In order to express 
1 5 said fusion polypeptide, it is necessary to ligate a polynucleotide according to the invention 
into the expression vector so that the translational reading frames of the fusion partner and 
the polynucleotide coincide. 

Well known examples of fusion partners include, but are not limited to, 
glutathione-S-transferase (GST), Fc portion of human IgG, maltose binding protein (MBP) 

20 and hexahistidine (HE*), which are particularly useful for isolation of the fusion 
polypeptide by affinity chromatography. For the purposes of fusion polypeptide 
purification by affinity chromatography, relevant matrices for affinity chromatography are 
glutathione-, amylose-, and nickel- or cobalt-conjugated resins respectively. Many such 
matrices are available in "kit" form, such as the QIAexpress™ system (Qiagen) useful with 

25 (HIS6) fusion partners and the Pharmacia GST purification system. In a preferred 
embodiment, the recombinant polynucleotide is expressed in the commercial vector 
pFLAG™. 

Another fusion partner well known in the art is green fluorescent protein (GFP). 
This fusion partner serves as a fluorescent "tag" which allows the fusion polypeptide of the 
30 invention to be identified by fluorescence microscopy or by flow cytometry. The GFP tag 
is useful when assessing subcellular localisation of a fusion polypeptide of the invention, 
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or for isolating cells which express a fusion polypeptide of the invention. Flow cytometric 
methods such as fluorescence activated cell sorting (FACS) are particularly useful in this 
latter application. Preferably, the fusion partners also have protease cleavage sites, such as 
for Factor Xa, Thrombin and inteins (protein introns), which allow the relevant protease to 
5 partially digest the fusion polypeptide of the invention and thereby liberate the 
recombinant polypeptide of the invention therefrom. The liberated polypeptide can then be 
isolated from the fusion partner by subsequent chromatographic separation. Fusion 
partners according to the invention also include within their scope "epitope tags", which 
are usually short peptide sequences for which a specific antibody is available. Well known 

10 examples of epitope tags for which specific monoclonal antibodies are readily available 
include c-Myc, influenza virus, haemagglutinin and FLAG tags. Alternatively, a fusion 
partner may be provided to promote other forms of immunity. For example, the fusion 
partner may be an antigen-binding molecule that is immuno-interactive with a 
conformational epitope on a target antigen or to a post-translational modification of a 

15 target antigen (e.g. 9 an antigen-binding molecule that is immuno-interactive with a 
glycosylated target antigen). 

The step of introducing the synthetic construct into the host cell may be effected 
by any suitable method including transfection, and transformation, the choice of which will 
be dependent on the host cell employed. Such methods are well known to those of skill in 
20 the art. 

Synthetic polypeptides of the invention may be produced by culturing a host cell 
transformed with the synthetic construct. The conditions appropriate for protein expression 
will vary with the choice of expression vector and the host cell. This is easily ascertained 
by one skilled in the art through routine experimentation. 

25 Suitable host cells for expression may be prokaryotic or eukaryotic. One preferred 

host cell for expression of a polypeptide according to the invention is a bacterium. The 
bacterium used may be Escherichia colu Alternatively, the host cell may be an insect cell 
such as, for example, SF9 cells that may be utilised with a baculovirus expression system. 

The synthetic polypeptide may be conveniently prepared by a person skilled in the 
30 art using standard protocols as for example described in Sambrook, et al 9 MOLECULAR 
CLONING. A LABORATORY MANUAL (Cold Spring Harbor Press, 1 989), in particular 
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Sections 16 and 17; Ausubel et al y CURRENT PROTOCOLS IN MOLECULAR 
BIOLOGY (John Wiley & Sons, Inc. 1994-1998), in particular Chapters 10 and 16; and 
Coligan et ai y CURRENT PROTOCOLS IN PROTEIN SCIENCE (John Wiley & Sons, 
Inc. 1995-1997), in particular Chapters 1, 5 and 6. 

5 The amino acids of the synthetic polypeptide can be any non-naturally occurring 

or any naturally occurring amino acid. Examples of unnatural amino acids and derivatives 
during peptide synthesis include but are not limited to, use of 4-amino butyric acid, 6- 
aminohexanoic acid, 4-amino-3-hydroxy-5-phenylpentanoic acid, 4-amino-3-hydroxy-6- 
methylheptanoic acid, t-butylglycine, norleucine, norvaline, phenylglycine, ornithine, 
10 sarcosine, 2-thienyl alanine and/or D-isomers of amino acids. A list of unnatural amino 
acids contemplated by the present invention is shown in TABLE C. 



TABLE C 







osaminobutyric acid 


L-N-methylalanine 


framino-ounethylbutyrate 


L-N-methylarginine 


aminocyclopropane-carboxylate 


L-N-methylasparagine 


aminoisobutyric acid 


I^N-methylaspartic acid 


aminonorbornyl-carboxylate 


L-N-methylcysteine 


cyclohexylalanine 


L-N-methylglutamine 


cyclopentylalanine 


L-N-methylglutamic acid 


L-N-methylisoleucine 


L-N-methylhistidine 


D-alanine 


L-N-methylleucine j 


D-arginine 


L-N-methyllysine 


D-aspartic acid 


L-N-methylmethionine 


D-cysteine 


L-N-methylnorleucine 


D-glutamate 


L-N-methylnorvaline 


D- glutamic acid 


L-N-methylornithine 
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1 Mof^^^m^ 
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D-histidine 


L-N-methylphenylalanine 


D-isoleucine 


L-N-methylproIine 


D-leucine 


L-N-medlylserine 


D-lysine 


L-N-methylthreonine 


D-methionine 


L-N-methyltryptophan 


D-orni thine 


L-N-methyltyrosine 


D-phenylalanine 


L-N-methylvaline 


D-proline 


L-N-methylethylglycine 


D-serine 


L-N-methyl-t-butylglycine 


D- threonine 


L-norleucine 


D-tryptophan 


L-norvaline 


D-tyrosine 


Of-methyl-aminoisobutyrate 


D-valine 


Of-methyl-7-aminobutyrate 


D-o-methylalanine 


aunethyicyclohexylalanine 


D-of-methylarginine 


omethylcylcopentylalanine 


D-o-methylasparagine 


omethyl-a-napthylalanine 


D-omethylaspartate 


Of-methylpenicillamine 


D-a-methylcysteine 


N-(4-aminobutyl)glycine 


D-Qf-methylglutamine 


N-(2-aminoethyl)glycine 


D-of-methylhistidine 


N-(3-aminopropyl)glycine 


D-op-methylisoleucine 


N-amino-a-methylbutyrate 


D-o-methylleucine 


onapthylalanine 


D-op-methyllysine 


N-benzylglycine 


D-omethylmethionine 


N-(2-carbamylediyl)g]ycine 


D-o-methylornitWine 


N-(carbamylmethyl)glycine 
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D-Of-methylphenylalanine 


N-(2-carboxyethyI)glycine ] 


D-Of-methylproline 


N-(carboxymethyl)glycine 


D-a-methylserine 


N-cyclobutylglycine 


D-omethylthreonine 


N-cycloheptylglycine 


D-ownethyltryptophan 


N-cyclohexylglycine 


D-cwnethyltyrosine 


N-cyclodecylglycine 


L-Of-methylleucine 


L-a-methyllysine 


L-a-methybmethionine 


L-a-methylnorleucine 


L-Of-methylnorvatine 


L-o-methylornithine 


L-oc-methylphenylalanine 


L-Of-methylproline 


L-a-methylserine 


L-Of-methylthreonine 


L-omethyltryptophan 


L-omethyltyrosine 


L-a-methylvaline 


L-N-methylhomophenylalanine 


N-(N-(2^-diphenylethyl 
caibamybnethyl)glycine 


N-(N-(3,3-diphenylpropyl 
carbamylmethyl)glycine 


1 -carboxy- 1 -(2,2-diphenyl-ethyl 
amino)cyclopropane 





The invention also contemplates modifying the synthetic polypeptides of the 
invention using ordinary molecular biological techniques so as to alter their resistance to 
proteolytic degradation or to optimise solubility properties or to render them more suitable 
S as an immunogenic agent. 

3. Preparation of synthetic polynucleotides of the invention 

The invention contemplates synthetic polynucleotides encoding the synthetic 
polypeptides as for example described in Section 2 supra. Polynucleotides encoding 
segments of a parent polypeptide can be produced by any suitable technique. For example, 
10 such polynucleotides can be synthesised de novo using readily available machinery. 
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Sequential synthesis of DNA is described, for example, in U.S. Patent No 4,293,652. 
Instead of de novo synthesis, recombinant techniques may be employed including use of 
restriction endonucleases to cleave a polynucleotide encoding at least a segment of the 
parent polypeptide and use of ligases to ligate together in frame a plurality of cleaved 
5 polynucleotides encoding different segments of the parent polypeptide. Suitable 
recombinant techniques are described for example in the relevant sections of Ausubel, et 
ai (supra) and of Sambrook, et aL, {supra) which are incorporated herein by reference. 
Preferably, the synthetic polynucleotide is constructed using splicing by overlapping 
extension (SOEing) as for example described by Horton et ah (1990, Biotechniques 8(5): 
10 528-535; 1995, Mol BiotechnoL 3(2): 93-99; and 1997, Methods Mol Biol. 67: 141-149). 
However, it should be noted that the present invention is not dependent on, and not 
directed to, any one particular technique for constructing the synthetic construct. 

Various modifications to the synthetic polynucleotides may be introduced as a 
means of increasing intracellular stability and half-life. Possible modifications include but 
15 are not limited to the addition of flanking sequences of ribo- or deoxy- nucleotides to the 5' 
and/or 3* ends of the molecule or the use of phosphorothioate or T O-methyl rather than 
phosphodiesterase linkages within the oligodeoxyribonucleotide backbone. 

The invention therefore contemplates a method of producing a synthetic 
polynucleotide as broadly described above, comprising linking together in the same 

20 reading frame at least two nucleic acid sequences encoding different segments of a parent 
polypeptide to form a synthetic polynucleotide, which encodes a synthetic polypeptide 
according to the invention. Suitably, nucleic acid sequences encoding at least 10 segments, 
preferably at least 20 segments, more preferably at least 40 segments and more preferably 
at least 100 segments of a parent polypeptide are employed to produce the synthetic 

25 polynucleotide. 

Preferably, the method further comprises selecting segments of the parent 
polypeptide, reverse translating the selected segments and preparing nucleic acid 
sequences encoding the selected segments. It is preferred that the method further comprises 
randomly linking the nucleic acid sequences together to form the synthetic polynucleotide. 
30 The nucleic acid sequences may be oligonucleotides or polynucleotides. 
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Suitably, segments are selected on the basis of size. Additionally, or in the 
alternative, segments are selected such that they have partial sequence identity or 
homology (Le. 9 sequence overlap) with one or more other segments. A number of factors 
can influence segment size and sequence overlap as mentioned above. In the case of 
5 sequence overlap, large amounts of duplicated nucleic acid sequences can sometimes result 
in sections of nucleic acid being lost during nucleic acid amplification (e.g., polymerase 
chain reaction, PCR) of such sequences, recombinant plasmid propagation in a bacterial 
host or during amplification of recombinant viruses containing such sequences. 
Accordingly, in a preferred embodiment, nucleic acid sequences encoding segments having 

10 sequence identity or homology with one or more other encoded segments are not linked 
together in an arrangement in which the identical or homologous sequences are contiguous. 
Also, it is preferable that different codons are used to encode a specific amino acid in a 
duplicated region. In this context, an amino acid of a parent polypeptide sequence is 
preferably reverse translated to provide a codon which, in the context of adjacent or local 

15 sequence elements, has a lower propensity of forming an undesirable sequence (e.g., a 
duplicated sequence or a palindromic sequence) that is refractory to the execution of a task 
(e.g. 9 cloning or sequencing). Alternatively, segments may be selected such that they 
contain a carboxyl terminal leucine residue or such that reverse translated sequences 
encoding the segments contain restriction enzyme sites for convenient splicing of the 

20 reverse translated sequences. 

The method optionally further comprises linking a spacer oligonucleotide 
encoding at least one spacer residue between segment-encoding nucleic acids. Such spacer 
residue(s) may be advantageous in ensuring that epitopes within the segments are 
processed and presented efficiently. Preferably, the spacer oligonucleotide encodes 2 to 3 
25 spacer residues. The spacer residue is suitably a neutral amino acid, which is preferably 
alanine. 

Optionally, the method further comprises Unking in the same reading frame as 
other segment-containing nucleic acid sequences at least one variant nucleic acid sequence 
which encodes a variant segment having a homologous but not identical amino acid 
30 sequence relative to other encoded segments. Suitably, the variant segment comprises 
conserved and/or non-conserved amino acid differences relative to one or more other 
encoded segments. Such differences may correspond to polymorphisms as discussed 
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above. In a preferred embodiment, degenerate bases are designed or built in to the at least 
one variant nucleic acid sequence to give rise to all desired homologous sequences. 

When a large number of polymorphisms is intended to be covered, it is preferred 
that multiple synthetic polynucleotides are constructed rather than a single synthetic 
5 polynucleotide, which encodes all variant segments. For example, if there is less than 85% 
homology between polymorphic polypeptides, then it is preferred that more than one 
synthetic polynucleotide is constructed. 

Preferably, the method further comprises optimising the codon composition of the 
synthetic polynucleotide such that it is translated efficiently by a host cell. In this regard, it 

10 is well known that the translational efficiency of different codons varies between 
organisms and that such differences in codon usage can be utilised to enhance the level of 
protein expression in a particular organism. In this regard, reference may be made to Seed 
et al (International Application Publication No WO 96/09378) who disclose the 
replacement of existing codons in a parent polynucleotide with synonymous codons to 

15 enhance expression of viral polypeptides in mammalian host cells. Preferably, the first or 
second most frequently used codons are employed for codon optimisation. 

Preferably, gene splicing by overlap extension or "gene SOEing" (supra) is 
employed for the construction of the synthetic polynucleotide which is a PCR-based 
method of recombining DNA sequences without reliance on restriction sites and of directly 

20 generating mutated DNA fragments in vitro. By modifying the sequences incorporated into 
the 5 '-ends of the primers, any pair of PGR products can be made to share a common 
sequence at one end. Under PCR conditions, the common sequence allows strands from 
two different fragments to hybridise to one another, forming an overlap. Extension of this 
overlap by DNA polymerase yields a recombinant molecule. However, a problem with 

25 long synthetic constructs is that mutations generally incorporate into amplified products 
during synthesis. In this instance, it is preferred that resolvase treatment is employed at 
various steps of the synthesis. Resolvases are bacteriophage-encoded endonucleases which 
recognise disruptions or mispairing of double stranded DNA and are primarily used by 
bacteriophages to resolve Holliday junctions (Mizuuchi, 1982; Youil et al. 9 1995). For 

30 example, 77 endonuclease I can be employed in synthetic DNA constructions to recognise 
mutations and cleave corrupted dsDNA. The mutated DNA strands are then hybridised to 
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non-mutant or correct DNA sequences, which results in a mispairing of DNA bases. The 
mispaired bases are recognised by the resolvase, which then cleaves the DNA nearby 
leaving only correctly hybridised sequences intact. Preferably a theimostable resolvase 
enzyme is employed during splicing or amplification so that errors are not incorporated in 
S downstream synthesis products. 

Synthetic polynucleotides according to the invention can be operably linked to a 
regulatory polynucleotide in the form a synthetic construct as for example described in 
Section 2 supra. Synthetic constructs of the invention have utility inter alia as nucleic acid 
vaccines. The choice of regulatory polynucleotide and synthetic construct will depend on 
10 the intended host. 

Exemplary expression vectors for expression of a synthetic polypeptide according 
to the invention include, but are not restricted to, modified Ankara Vaccinia virus as for 
example described by Allen et al (2000, /. Immunol 164(9): 4968-4978), fowlpox virus as 
for example described by Boyle and Coupar (1988, Virus Res. 10: 343-356) and the herpes 
15 simplex amplicons described for example by Fong et al in U.S. Patent No. 6,051,428. 
Alternatively, Adenovirus and Epstein-Barr virus vectors, which are preferably capable of 
accepting large amounts of DNA or RNA sequence information, can be used. 

Preferred promoter sequences that can be utilised for expression of synthetic 
polypeptides include the P7.5 or PE/L promoters as for example disclosed by Kumar and 
20 Boyle. (1990, Virology 179: 151-158), CMV and RSV promoters. 

The synthetic construct optionally further includes a nucleic acid sequence 
encoding an immunostimulatory molecule. The immunostimulatory molecule may be 
fusion partner of the synthetic polypeptide. Alternatively, the immunostimulatory molecule 
may be translated separately from the synthetic polypeptide. Preferably, the 

25 immunostimulatory molecule comprises a general immunostimulatory peptide sequence. 
For example, the immunostimulatory peptide sequence may comprise a domain of an 
invasin protein (Inv) from die bacteria Yersinia spp as for example disclosed by Brett et al 
(1993, Eur. J. Immunol. 23: 1608-1614). This immune stimulatory property results from 
the capability of this invasin domain to interact with the 01 integrin molecules present on T 

30 cells, particularly activated immune or memory T cells. A preferred embodiment of the 
invasin domain (Inv) for linkage to a synthetic polypeptide has been previously described 
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in U.S. Pat. No. 5,759,551. The said Inv domain has the sequence: Thr-Ala-Lys-Ser-Lys- 
Lys-Phe-Pro-Ser-T>T-Thr-Ala-Thr-Tyr-Gln-Phe [SEQ ID NO; 1467] or is an immune 
stimulatory homologue thereof from the corresponding region in another Yersinia species 
invasin protein. Such homologues thus may contain substitutions, deletions or insertions of 
5 amino acid residues to accommodate strain to strain variation, provided that the 
homologues retain immune stimulatory properties. The general immunostimulatory 
sequence may optionally be linked to the synthetic polypeptide by a spacer sequence. 

In an alternate embodiment, the immunostimulatory molecule may comprise an 
immunostimulatory membrane or soluble molecule, which is suitably a T cell co- 
10 stimulatory molecule. Preferably, the T cell co-stimulatory molecule is a B7 molecule or a 
biologically active fragment thereof, or a variant or derivative of these. The B7 molecule 
includes, but is not restricted to, B7-1 and B7-2. Preferably, the B7 molecule is B7-1. 
Alternatively, the T cell co-stimulatory molecule may be an ICAM molecule such as 
ICAM-1 andICAM-2. 

15 In another embodiment, the immunostimulatory molecule can be a cytokine, 

which includes, but is not restricted to, an interleukin, a lymphokine, tumour necrosis 
factor and an interferon. Alternatively, the immunostimulatory molecule may comprise an 
immunomodulatory oligonucleotide as for example disclosed by Krieg in U.S. Patent No. 
6,008,200. 

20 Suitably, the size of the synthetic polynucleotide does not exceed the ability of 

host cells to transcribe, translate or proteolytically process and present epitopes to the 
immune system. Practitioners in the art will also recognise that the size of the synthetic 
polynucleotide can impact on the capacity of an expression vector to express the synthetic 
polynucleotide in a host cell. In this connection, it is known that the efficacy of DNA 

25 vaccination reduces with expression vectors greater that 20-kb. In such situations it is 
preferred that a larger number of smaller synthetic constructs is utilised rather than a single 
large synthetic construct. 

4. Immunopotentiating compositions 

The invention also contemplates a composition, comprising an 
30 immunopotentiating agent selected from the group consisting of a synthetic polypeptide as 
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described in Section 2, and a synthetic polynucleotide or a synthetic construct as described 
in Section 3, together with a pharmaceutical^ acceptable carrier. One or more 
immunopotentiating agents can be used as actives in the preparation of 
immunopotentiating compositions. Such preparation uses routine methods known to 
5 persons skilled in the art. Typically, such compositions are prepared as injectables, either 
as liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, 
liquid prior to injection may also be prepared. The preparation may also be emulsified. The 
active immunogenic ingredients are often mixed with excipients that are pharmaceutically 
acceptable and compatible with the active ingredient. Suitable excipients are, for example, 

10 water, saline, dextrose, glycerol, ethanol, or the like and combinations thereof. In addition, 
if desired, the vaccine may contain minor amounts of auxiliary substances such as wetting 
or emulsifying, agents, pH buffering agents, and/or adjuvants that enhance the effectiveness 
of the vaccine. Examples of adjuvants which may be effective include but are not limited 
to: aluminium hydroxide, N-acetyl-muramyl-L-threonyl-D-isoglutamine (thur-MDP), N- 

15 acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as nor-MDP), N- 
acetylmuramyl-I^alanyl-D-iso^ 

hydroxyphosphorjdoxy>ethylamine (CGP 1983 A, referred to as MTP-PE), and RIBI, 
which contains three components extracted from bacteria, monophosphoryl lipid A, 
trehalose dimycolate and cell wall skeleton (MPL+TDM+CWS) in a 2% squalene/Tween 
20 80 emulsion. For example, the effectiveness of an adjuvant may be determined by 
measuring the amount of antibodies resulting from the administration of the composition, 
wherein those antibodies are directed against one or more antigens presented by the treated 
cells of the composition. 

The immunopotentiating agents may be formulated into a composition as neutral 
25 or salt forms. Pharmaceutically acceptable salts include the acid addition salts (formed 
with free amino groups of the peptide) and which are formed with inorganic acids such as, 
for example, hydrochloric or phosphoric acids, or such organic acids such as acetic, oxalic, 
tartaric, maleic, and the like. Salts formed with the free carboxyl groups may also be 
derived from inorganic basis such as, for example, sodium, potassium, ammonium, 
30 calcium, or ferric hydroxides, and such organic basis as isopropylamine, trimethylamine, 
2-ethylamino ethanol, histidine, procaine, and the like. 
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If desired, devices or compositions containing the immunopotentiating agents 
suitable for sustained or intermittent release could be, in effect, implanted in the body or 
topically applied thereto for the relatively slow release of such materials into the body. 

The compositions are conventionally administered parenterally, by injection, for 
5 example, either subcutaneously or intramuscularly. Additional formulations which are 
suitable for other modes of administration include suppositories and, in some cases, oral 
formulations. For suppositories, traditional binders and carriers may include, for example, 
polyalkylene glycols or triglycerides; such suppositories may be formed from mixtures 
containing the active ingredient in the range of 0.5% to 10%, preferably l%-2%. Oral 
10 formulations include such normally employed excipients as, for example, pharmaceutical 
grades of mannitol, lactose, starch, magnesium carbonate, and the like. These compositions 
take the form of solutions, suspensions, tablets, pills, capsules, sustained release 
formulations or powders and contain 10%-95% of active ingredient, preferably 25%-70%. 

Administration of the gene therapy construct to said mammal, preferably a 
15 human, may include delivery via direct oral intake, systemic injection, or delivery to 
selected tissue(s) or cells, or indirectly via delivery to cells isolated from the mammal or a 
compatible donor. An example of the latter approach would be stem cell therapy, wherein 
isolated stem cells having potential for growth and differentiation are transfected with the 
vector comprising the Soxl8 nucleic acid. The stem cells are cultured for a period and then 
20 transferred to the mammal being treated. 

With regard to nucleic acid based compositions, all modes of delivery of such 
compositions are contemplated by the present invention. Delivery of these compositions to 
cells or tissues of an animal may be facilitated by microprojectile bombardment, liposome 
mediated transfection (eg., lipofectin or lipofectamine), electroporation, calcium 

25 phosphate or DEAE-dextran-mediated transfection, for example. In an alternate 
embodiment, a synthetic construct may be used as a therapeutic or prophylactic 
composition in the form of a "naked DNA" composition as is known in the art. A 
discussion of suitable delivery methods may be found in Chapter 9 of CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY (Eds. Ausubel et aL; John Wiley & Sons 

30 Inc., 1997 Edition) or on the Internet site DNAvaccine.com. The compositions may be 
administered by intradermal (e.g., using panjet™ delivery) or intramuscular routes. 
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The step of introducing the synthetic polynucleotide into a target cell will differ 
depending on the intended use and species, and can involve one or more of non- viral and 
viral vectors, cationic liposomes, retroviruses, and adenoviruses such as, for example, 
described in Mulligan, R.C., (1993 Science 260 926-932) which is hereby incorporated by 
5 reference. Such methods can include, for example: 

A. Local application of the synthetic polynucleotide by injection (Wolff et al. y 1990, 
Science 247 1465-1468, which is hereby incorporated by reference), surgical 
implantation, instillation or any other means. This method can also be used in 
combination with local application by injection, surgical implantation, instillation or 
10 any other means, of cells responsive to the protein encoded by the synthetic 
polynucleotide so as to increase the effectiveness of that treatment. This method can 
also be used in combination with local application by injection, surgical implantation, 
instillation or any other means, of another factor or factors required for the activity of 
said protein. 

15 B. General systemic delivery by injection of DNA, (Calabretta et al. 9 1993, Cancer Treat 
Rev. 19 169-179, which is incorporated herein by reference), or RNA, alone or in 
combination with liposomes (Zhu et al y 1993, Science 261 209-212, which is 
incorporated herein by reference), viral capsids or nanoparticles (Bertling et al., 1991, 
Biotech. Appl Biochem. 13 390-405, which is incorporated herein by reference) or any 

20 other mediator of delivery. Improved targeting might be achieved by linking the 
synthetic polynucleotide to a targeting molecule (the so-called "magic bullet" approach 
employing, for example, an antibody), or by local application by injection, surgical 
implantation or any other means, of another factor or factors required for the activity of 
the protein encoding said synthetic polynucleotide , or of cells responsive to said 

25 protein. 

C. Injection or implantation or delivery by any means, of cells that have been modified ex 
vivo by transfection (for example, in the presence of calcium phosphate: Chen et al, 
1987, Mole. Cell Biochem. 7 2745-2752, or of cationic lipids and polyamines: Rose et 
al. y 1991, BioTeck 10 520-525, which articles are incorporated herein by reference), 
30 infection, injection, electroporation (Shigekawa et al. y 1988, BioTech. 6 742-751, 
which is incorporated herein by reference) or any other way so as to increase the 
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expression of said synthetic polynucleotide in those cells. The modification can be 
mediated by plasmid, bacteriophage, cosmid, viral (such as adenoviral or retroviral; 
Mulligan, 1993, Science 260 926-932; Miller, 1992, Nature 357 455-460; Salmons et 
al 9 1993, Hum. Gen. Ther. 4 129-141, which articles are incorporated herein by 
5 reference) or other vectors, or other agents of modification such as liposomes (Zhu et 
aL 9 1993, Science 261 209-212, which is incorporated herein by reference), viral 
capsids or nanoparticles (Bertling et al. 9 1991, Biotech. Appl Biochem. 13 390-405, 
which is incorporated herein by reference), or any other mediator of modification. The 
use of cells as a delivery vehicle for genes or gene products has been described by Barr 
10 et al y 1991, Science 254 1507-1512 and by Dhawan et al. y 1991, Science 254 1509- 
1512, which articles are incorporated herein by reference. Treated cells can be 
delivered in combination with any nutrient, growth factor, matrix or other agent that 
will promote their survival in the treated subject. 

Also encapsulated by the present invention is a method for treatment and/or 
1 5 prophylaxis of a disease or condition, comprising administering to a patient in need of such 
treatment a therapeutically effective amount of a composition as broadly described above. 
The disease or condition may be caused by a pathogenic organism or a cancer as for 
example described above. 

hi a preferred embodiment, the immunopotentiating composition of the invention 
20 is suitable for treatment of, or prophylaxis against, a cancer. Cancers which could be 
suitably treated in accordance with the practices of this invention include cancers of the 
lung, breast, ovary, cervix, colon, head and neck, pancreas, prostate, stomach, bladder, 
kidney, bone liver, oesophagus, brain, testicle, uterus, melanoma and the various leukemias 
and lymphomas. 

25 In an alternate embodiment, the immunopotentiating composition is suitable for 

treatment of> or prophylaxis against, a viral, bacterial or parasitic infection. Viral infections 
contemplated by the present invention include, but are not restricted to, infections caused 
by HIV, Hepatitis, Influenza, Japanese encephalitis virus, Epstein-Barr virus and 
respiratory syncytial virus. Bacterial infections include, but are not restricted to, those 

30 caused by Neisseria species, Meningococcal species, Haemophilus species Salmonella 
species, Streptococcal species, Legionella species and Mycobacterium species. Parasitic 
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infections encompassed by the invention include, but are not restricted to, those caused by 
Plasmodium species, Schistosoma species, Leishmania species, Trypanosoma species, 
Toxoplasma species and Giardia species. 

The above compositions or vaccines may be administered in a manner compatible 
5 with the dosage formulation, and in such amount as is therapeutically effective to alleviate 
patients from the disease or condition or as is prophylactically effective to prevent 
incidence of the disease or condition in the patient. The dose administered to a patient, in 
the context of the present invention, should be sufficient to effect a beneficial response in a 
patient over time such as a reduction or cessation of blood loss. The quantity of the 

10 composition or vaccine to be administered may depend on the subject to be treated 
inclusive of the age, sex, weight and general health condition thereof. In this regard, 
precise amounts of the composition or vaccine for administration will depend on the 
judgement of the practitioner. In determining the effective amount of the composition or 
vaccine to be administered in the treatment of a disease or condition, the physician may 

15 evaluate the progression of the disease or condition over time. In any event, those of skill 
in the art may readily determine suitable dosages of the composition or vaccine of the 
invention. 

In a preferred embodiment, DNA-based immunopotentiating agent 100 /ig) 
is delivered intradermally into a patient at day 1 and at week 8 to prime the patient. A 
20 recombinant poxvirus (e.g. y at 10 7 pfu/mL) from which substantially the same 
immunopotentiating agent can be expressed is then delivered intradermally as a booster at 
weeks 16 and 24, respectively. 

The effectiveness of the immunisation may be assessed using any suitable 
technique. For example, CTL lysis assays may be employed using stimulated splenocytes 

25 or peripheral blood mononuclear cells (PBMC) on peptide coated or recombinant virus 
infected cells using 51 Cr labelled target cells. Such assays can be performed using for 
example primate, mouse or human cells (Allen et a/., 2000, J. Immunol. 164(9): 4968-4978 
also WoodbOTy et al. y infra). Alternatively, the efficacy of the immunisation may be 
monitored using one or more techniques including, but not limited to, HLA class I 

30 Tetramer staining - of both fresh and stimulated PBMCs (see for example Allen et al. y 
siq>ra) y proliferation assays (Allen et ai> supra), Elispot™ Assays and intracellular INF- 
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gamma staining (Allen et al., supra), ELISA Assays - for linear B cell responses; and 
Western blots of cell sample expressing the synthetic polynucleotides. 

5. Computer related embodiments 

The design or construction of a synthetic polypeptide sequence or a synthetic 
5 polynucleotide sequence according to the invention is suitably facilitated with the 
assistance of a computer programmed with software, which inter alia fragments a parent 
sequence into fragments, and which links those fragments together in a different 
relationship relative to their linkage in the parent sequence. The ready use of a parent 
sequence for the construction of a desired synthetic molecule according to the invention 
10 requires that it be stored in a computer-readable format Thus, in accordance with the 
present invention, sequence data relating to a parent molecule {e.g., a parent polypeptide) 
is stored in a machine-readable storage medium, which is capable of processing the data to 
fragment the sequence of the parent molecule into fragments and to link together the 
fragments in a different relationship relative to their linkage in the parent molecule. 

15 Therefore, another embodiment of the present invention provides a machine- 

readable data storage medium, comprising a data storage material encoded with machine 
readable data which, when used by a machine programmed with instructions for using said 
data, fragments a parent sequence into fragments, and links those fragments together in a 
different relationship relative to their linkage in the parent sequence. In a preferred 

20 embodiment of this type, a machine-readable data storage medium is provided that is 
capable of reverse translating the sequence of a respective fragment to provide a nucleic 
acid sequence encoding the fragment and to link together in the same reading frame each 
of the nucleic acid sequences to provide a polynucleotide sequence that codes for a 
polypeptide sequence in which said fragments are linked together in a different relationship 

25 relative to their linkage in a parent polypeptide sequence. 

In another embodiment, the invention encompasses a computer for designing the 
sequence of a synthetic polypeptide and/or a synthetic polynucleotide of the invention, 
wherein the computer comprises wherein said computer comprises: (a) a machine readable 
data storage medium comprising a data storage material encoded with machine readable 
30 data, wherein said machine readable data comprises the sequence of a parent polypeptide; 
(b) a working memory for storing instructions for processing said machine-readable data; 
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(c) a central-processing unit coupled to said working memory and to said machine-readable 
data storage medium, for processing said machine-readable data into said synthetic 
polypeptide sequence and/or said synthetic polynucleotide; and (d) an output hardware 
coupled to said central processing unit, for receiving said synthetic polypeptide sequence 
5 and/or said synthetic polynucleotide. 

In yet another embodiment, the invention contemplates a computer program 
product for designing the sequence of a synthetic polynucleotide of the invention, 
comprising code that receives as input the sequence of a parent polypeptide, code that 
fragments the sequence of the parent polypeptide into fragments, code that reverse 

10 translates the sequence of a respective fragment to provide a nucleic acid sequence 
encoding the fragment, code that links together in the same reading frame each said nucleic 
acid sequence to provide a polynucleotide sequence that codes for a polypeptide sequence 
in which said fragments are linked together in a different relationship relative to their 
linkage in the parent polypeptide sequence, and a computer readable medium that stores 

15 the codes. 

A version of these embodiments is presented in Figure 23, which shows a system 
10 including a computer 11 comprising a central processing unit ("CPU") 20, a working 
memory 22 which may be, e.g., RAM (random-access memory) or "core" memory, mass 
storage memory 24 (such as one or more disk drives or CD-ROM drives), one or more 
20 cathode-ray tube ("CRT*) display terminals 26, one or more keyboards 28, one or more 
input lines 30, ami one or more output lines 40, all of which are interconnected by a 
conventional bidirectional system bus 50. 

Input hardware 36, coupled to computer 11 by input lines 30, may be 
implemented in a variety of ways. For example, machine-readable data of this invention 
25 may be inputted via the use of a modem or modems 32 connected by a telephone line or 
dedicated data line 34. Alternatively or additionally, the input hardware 36 may comprise 
CD. Alternatively, ROM drives or disk drives 24 in conjunction with display terminal 26, 
keyboard 28 may also be used as an input device. 

Output hardware 46, coupled to computer 1 1 by output lines 40, may similarly be 
30 implemented by conventional devices. By way of example, output hardware 46 may 
include CRT display terminal 26 for displaying a synthetic polynucleotide sequence or a 
synthetic polypeptide sequence as described herein. Output hardware might also include a 
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printer 42, so that hard copy output may be produced, or a disk drive 24, to store system 
output for later use. 

In operation, CPU 20 coordinates the use of the various input and output devices 
36,46 coordinates data accesses from mass storage 24 and accesses to and from working 
5 memory 22, and determines the sequence of data processing steps. A number of programs 
may be used to process the machine readable data of this invention. Exemplary programs 
may use for example the steps outlined in the flow diagram illustrated in Figure 24. 
Broadly, these steps include (1) inputting at least one parent polypeptide sequence; (2) 
optionally adding to alanine spacers at the ends of each polypeptide sequence; (3) 

10 fragmenting the polypeptide sequences into fragments (e.#., 30 amino acids long), which 
are preferably overlapping (eg., by 15 amino acids); (4) reverse translating the fragment to 
provide a nucleic acid sequence for each fragment and preferably using for the reverse 
translation first and second most translationally efficient codons for a cell type, wherein the 
codons are preferably alternated out of frame with each other in the overlaps of 

15 consecutive fragments; (5) randomly rearranging the fragments; (6) checking whether 
rearranged fragments recreate at least a portion of a parent polypeptide sequence; (7) 
repeating randomly rearranging the fragments when rearranged fragments recreate said at 
least a portion; or otherwise (8) linking the rearranged fragments together to produce a 
synthetic polypeptide sequence and/or a synthetic polynucleotide sequence; and (9) 

20 outputting said synthetic polypeptide sequence and/or a synthetic polynucleotide sequence. 
An example of an algorithm which uses inter alia the aforementioned steps is shown in 
Figure 25. By way of example, this algorithm has been used for the design of synthetic 
polynucleotides and synthetic polypeptides according to the present invention for Hepatitis 
C la and for melanoma, as illustrated in Figures 26 and 27. 

25 Figure 28 shows a cross section of a magnetic data storage medium 100 which can 

be encoded with machine readable data, or set of instructions, for designing a synthetic 
molecule of the invention, which can be carried out by a system such as system 10 of 
Figure 23. Medium 100 can be a conventional floppy diskette or hard disk, having a 
suitable substrate 101, which may be conventional, and a suitable coating 102, which may 

30 be conventional, on one or both sides, containing magnetic domains (not visible) whose 
polarity or orientation can be altered magnetically. Medium 100 may also have an opening 
(not shown) for receiving the spindle of a disk drive or other data storage device 24. The 
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magnetic domains of coating 102 of medium 100 are polarised or oriented so as to encode 
in manner which may be conventional, machine readable data such as that described 
herein, for execution by a system such as system 10 of Figure 23. 

Figure 29 shows a cross section of an optically readable data storage medium 110 
5 which also can be encoded with such a machine-readable data, or set of instructions, for 
designing a synthetic molecule of the invention, which can be carried out by a system such 
as system 10 of Figure 23. Medium 110 can be a conventional compact disk read only 
memory (CD-ROM) or a rewritable medium such as a magneto-optical disk, which is 
optically readable and magneto-optically writable. Medium 100 preferably has a suitable 
10 substrate 111, which may be conventional, and a suitable coating 112, which may be 
conventional, usually of one side of substrate 111. 

In the case of CD-ROM, as is well known, coating 112 is reflective and is 
impressed with a plurality of pits 113 to encode the machine-readable data. The 
arrangement of pits is read by reflecting laser light off the surface of coating 112. A 
15 protective coating 114, which preferably is substantially transparent, is provided on top of 
coating 112. 

In the case of a magneto-optical disk, as is well known, coating 1 12 has no pits 
1 13, but has a plurality of magnetic domains whose polarity or orientation can be changed 
magnetically when heated above a certain temperature, as by a laser (not shown). The 
20 orientation of the domains can be read by measuring the polarisation of laser light reflected 
from coating 112. The arrangement of the domains encodes the data as described above. 

In order that the invention may be readily understood and put into practical effect, 
particular preferred non-limiting embodiments will now be described as follows. 
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EXAMPLES 
EXAMPLE 1 

Preparation of an HIVSavine 

Experimental Protocol 
5 Plasmids 

The plasmid pDNAVacc is ampicillin resistant and contains an expression 
cassette comprising a CMV promoter and enhancer, a synthetic intron, a multiple cloning 
site (MCS) and a SV40poly A signal sequence (Thomson et al y 1998). The plasmid 
pTK7.5 and contains a selection cassette, a pox virus 7.5 early/late promoter and a MCS 
1 0 flanked on either side by Vaccinia virus TK gene sequences. 

Recombinant Vaccinia Viruses 

Recombinant Vaccinia viruses expressing the gag, env (US) and pol (LAI) genes 
of HIV-1 were used as previously described and denoted W-GAG, W-POL, W-ENV 
(Woodbenry et aL, 1999; Kent et a/., 1998). 

1 5 Marker Rescue Recombination 

Recombinant Vaccinia viruses containing Savine constructs were generated by 
marker rescue recombination, using protocols described previously (Boyle et al. y 1985). 
Plaque purified viruses were tested for the TK phenotype and for the appropriate genome 
arrangement by Southern blot and PCR. 

20 Oligonucleotides 

Oligonucleotides 50 nmol scale and desalted were purchased from Life 
Technologies. Short oligonucleotides were resuspended in 100 \iL of water, their 
concentration determined, then diluted to 20 \iM for use in PCR or sequencing reactions. 
Long oligonucleotides for splicing reactions were denatured for 5 minutes at 94°C in 
25 20 jiL of formamide loading buffer then 0.5 \iL gel purified on a 6% polyacrylamide gel. 
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Gel slices containing full-length oligonucleotides were visualised with ethidium bromide, 
excised, placed in Eppendorf™ tubes, combined with 200 \iL of water before being 
crushed using the plunger of a 1 mL syringe. Before being used in splicing reactions the 
crushed gel was resuspended in an appropriate volume of buffer and 1-2 \iL of the 
5 resuspendate used directly in the splicing reactions. 

Sequencing 

Sequencing was performed using Dye terminator sequencing reactions and 
analyzed by the Biomedical Resource Facility at the John Curtin School of Medical 
Research using an ABI automated sequencer. 

1 0 Restimulation of Lymphocytes from HIV Infected Patients 

Two pools of recombinant Vaccinia viruses containing W-AC1 + W-BC1 (Pool 
1) or VV-AC2 + W-BC2 + W-CC2 (Pool 2) were used to restimulate lymphocytes from 
the blood samples of HIV-infected patients. Briefly CTL lines were generated from HIV- 
infected donor PBMC. A fifth of the total PBMC were infected with either Pool 1 or Pool 2 

15 Vaccinia viruses thai added back to the original cell suspension. The infected cell 
suspension was then cultured with EL-7 for 1 week. 

CTL Assays 

Restimulated PBMCs were used as effectors in a standard 51 Cr-release CTL assay. 
Targets were autologous EBV-transformed lymphoblastoid cell lines (LCLs) infected with 
20 the following viruses : Pool 1, Pool 2,W-GAG, W-POL or W-ENV. Assay controls 
included uninfected targets, targets infected with W-lacZ (virus control) and K562 cells. 

Results 

HIV Savine Design 

A main goal of the Savine strategy is to include as much protein sequence 
25 information from a pathogen or cancer as possible in such a way that potential T cell 
epitopes remain intact and so that the vaccine or therapy is extremely safe. An HIV Savine 
is described herein not only to compare this strategy to other strategies but also, to produce 
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an HIV vaccine that would provide the maximum possible population coverage as well as 
catering for the major HIV clades. 

A number of design criteria was first determined to exploit the many advantages 
of using a synthetic approach. One advantage is that it is possible to use consensus protein 
5 sequences to design these vaccines. Using consensus sequences for a highly variable virus 
like HTV should provide better vaccine coverage because individual viral isolate sequences 
may have lost epitopes which induce CTL against the majority of other viral isolates. Thus, 
using the consensus sequences of each HIV clade rather than individual isolate sequences 
should provide better vaccine coverage. Taking this one step further, a consensus sequence 

10 that covers all HIV clades should theoretically provide better coverage than using just the 
consensus sequences for individual clades. Before designing such a sequence however, it 
was decided that a more appropriate and focussed HIV vaccine might be constructed if the 
various clades were first ranked according to their relative importance. To establish such a 
ranking the following issues were considered, current prevalence of each clade, the rate at 

15 which each clade is increasing and the capacity of various regions of the world to cope 
with the HIV pandemic (Figures 1 and 2). These criteria produced the following ranking, 
Clade E > clade A > clade C > clade B > clade D > other clades. Clades E and A were 
considered to almost equal since they are very similar except in their envelope protein 
sequences, which differ considerably. 

20 Another advantage of synthesising a designed sequence is that it is possible to 

incorporate degenerate sequences into their design. In the case of HIV, this means that 
more than one amino acid can be included at various positions to improve the ability of the 
vaccine to cater for the various HIV clades and isolates. Coverage is improved because 
mutations in different HTV clades and also in individual isolate sequences, while mostly 

25 destroying specific T cell epitopes, can result in the formation of new potentially useful 
epitopes nearby (Goulder et aL, 1997). Incorporating degenerate amino acid sequences, 
however, also means that more than one construct must be made and mixed together. The 
number of constructs required depends on the frequency with which mutations are 
incorporated into the design. While this approach requires the construction of additional 

30 constructs, these constructs can be prepared from the same set of degenerate long 
oligonucleotides, significantly reducing the cost of providing such considerable interclade 
coverage. 
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A set of degeneracy rules was developed for the incorporation of amino acid 
mutations into the design which meant that a maximum of eight constructs would be 
required so that theoretically all combinations were present, as follows: 1) Two amino 
acids at three positions (or less) within any group of nine amino acids (i.e., present in a 
5 CTL epitope); 2) Three amino acids at one position and two at another (or not) within any 
group of nine amino acids; 3) Four amino acids at one position and two at another (or not) 
within any group of nine amino acids. The reason why these rules were applied to nine 
amino acids (the average CTL epitope size) and not to larger stretches of amino acid 
sequence to cater for class II restricted epitopes, is because class H-restricted epitopes 
10 generally have a core sequence of nine amino acids in the middle which bind specifically 
to class II MHC molecules with the extra flanking sequences stabilising binding, by 
associating with either side of class II MHC antigens in a largely sequence independent 
manner (Brown et al y 1993). 

Using the HIV clade ranking described above, the amino acid degeneracy rules 
IS and in some situations the similarity between amino acids, a degenerate consensus protein 
sequence was designed for each HIV protein using the consensus protein sequences for 
each HIV clade compiled by the Los Alamos HIV sequence database (Figures 3-11) (HIV 
Molecular Immunology Database, 1997). It is important to note that in some situations the 
order with which each of the above design criteria was applied was altered. Each time this 
20 was done the primary goal however was to increase the ability of the Savine to cater for 
interclade differences. Two isolate sequences, GenBank accession U51189 and U46016, 
for clade E and clade C, respectively, were used when a consensus sequence for some HIV 
proteins from these two clades was unavailable (Gao et al. 9 1996; Salminen et al. 9 1996). 
The design of a consensus sequence for the hypervariable regions of the HIV envelope 
25 protein and in some cases between these regions (hypervariable regions 1-2 and 3-5) was 
difficult and so these regions were excluded from the vaccine design. 

Once a degenerate consensus sequence was designed for each HIV protein, an 
approach was then determined for incorporating all the protein sequences safely into the 
vaccine. One convenient approach to ensure that a vaccine will be safe is to systematically 
30 fragment and randomly rearrange the protein sequences together thus abrogating or 
otherwise altering their structure and function. The protein sequences still have to be 
immunologically functional however, meaning that the process used to fragment the 
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sequences should not destroy potential epitopes. To decide on the best approach for 
systematically fragmenting protein sequences, the main criteria used was the size of T 
epitopes and their processing requirements. Class I-restricted T cell epitopes are 8-10 
amino acids long and generally require 2-3 natural flanking amino acids to ensure their 
5 efficient processing and presentation if placed next to unnatural flanking residues (Del Val 
et al. 9 1991; Thomson et al y 1995). Class H-restricted T cell epitopes range between 12-25 
amino acids long and do appear to require natural flanking residues for processing 
however, it is difficult to rule out a role for natural flanking residues in all cases due to the 
complexity of their processing pathways (Thomson et al 9 1998). Also class H-restricted 

10 epitopes despite being larger than CTL epitopes generally have a core sequence of 9-10 
amino acids, which binds to MHC molecules in a sequence specific fashion. Thus, based 
on current knowledge, it was decided that an advantageous approach was to overlap the 
fragments by at least 15 amino acids to ensure that potential epitopes which might lie 
across fragment boundaries are not lost and to ensure that CTL epitopes near fragment 

15 boundaries, that are placed beside or near inhibitory amino acids in adjacent fragments, are 
processed efficiently. In deciding the optimal fragment size, the main criteria used were 
that size had to be small enough to cause the maximum disruption to the structure and 
function of proteins but large enough to cover the sequence information as efficiently as 
possible without any further unnecessary duplication. Based on these criteria the fragments 

20 would be twice the overlap size, in this case 30 amino acids long. 

The designed degenerate protein sequences were then separated into fragments 30 
amino acid long and overlapping by fifteen amino acids. Two alanine amino acids were 
also added to the start and aid of the first and last fragment for each protein or envelop 
protein segment to ensure these fragments were not placed directly adjacent to amino acids 

25 capable of blocking epitope processing (Del Val et al, 1991). The next step was to reverse 
translate each protein sequence back into DNA. Duplicating DNA sequences was avoided 
when constructing DNA sequences encoding a tandem repeat of identical or homologous 
amino acid sequences to maximise expression of the Savine. In this regard, the first and 
second most commonly used mammalian codons (shown in Figure 12) were assigned to 

30 amino acids in these repeat regions, wherein a first codon was used to encode an amino 
acid in one of the repeated sequences and wherein a second but synonymous codon was 
used for the other repeated sequence (eg., see the gag HIV protein in Figure 13). To cater 
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for the designed amino acid mutations more than one base was assigned to some positions 
using the IUPAC DNA codes without exceeding more than three base variations (eight 
possible combinations) in any group of 27 bases (Figure 12). Where a particular 
combination of amino acids could not be incorporated, because too many degenerate bases 
5 would be required, some or all of the amino acid degeneracy was removed according to the 
protein consensus design rules outlined above. Also the degenerate codons were checked 
to determine if they could encode a stop codon, if stop codons could not be avoided then 
the amino acid degeneracy was also simplified again according to the protein consensus 
design rules outlined above. 

10 The designed DNA segments were then scrambled randomly and joined to create 

twenty-two subcassettes approximately 840 bp in size. Extra DNA sequences incorporating 
sites for one of the cohesive restriction enzymes Xbal> Spel, AvrU or Nhel and 3 additional 
base pairs (to cater for premature Taq polymerase termination) were then added to each 
end of each subcassette (Figure 14). Some of these extra DNA sequences also contained, 

15 the cohesive restriction sites for Sail or Xhol, Kozak signal sequences and start or stop 
codons to enable the subcassettes to be joined and expressed either as three large cassettes 
or one full length protein (Figures 14 and 15). 

In designing the HIV Savine one issue that required investigation was whether 
such a large DNA molecule would be fully expressed and whether epitopes encoded near 

20 the aid of the molecule would be efficiently presented to the immune system. The 
inventors also wished to show that mixing two or more degenerate Savine constructs 
together could induce T cell responses that recognise mutated sequences. To examine both 
issues DNA coding for a degenerate murine influenza nucleoprotein CTL epitope, NP365- 
373, which differs by two amino acids at positions 71 and 72 in influenza strain A/PR/8/34 

25 compared to the A/NT/60/68strain and restricted by H2-Db, was inserted before the last 
stop codon at the end of the HIV Savine design (Figure 15). An important and unusual 
characteristic of both of these naturally occurring NP365-373 sequences, which enabled 
the present inventors to examine the effectiveness of incorporating mutated sequences, is 
that they generate CTL responses which do not cross react with the alternate sequence 

30 (Townsend et. al y 1986). This is an unusual characteristic because epitopes not destroyed 
by mutation usually induce CTL responses that cross-react. 



WO 01/090197 



PCT/AU01/00622 



-120- 

Up to ten long oligonucleotides up to 100 bases long and two short amplification 
oligonucleotides were synthesised to enable construction of each subcassette (Life 
Technologies). In designing each oligonucleotide the 3* end and in most cases also the 5' 
end had to be either a V or a 'g' to ensure efficient extension during PCR splicing. The 
5 overlap region for each long oligonucleotide was designed to be at least 16 bp with 
approximately 50% G/C content. Also oligonucleotide overlaps were not placed where 
degenerate DNA bases coded for degenerate amino acids to avoid splicing difficulties 
later. Where this was too difficult some degenerate bases were removed according to the 
protein consensus design rules outlined above and indicated in Figure 12. Figure 16 shows 
10 an example of the oligonucleotides design for each subcassette. 

Construction of the HIV Savine 

Five of each group of ten designed oligonucleotides were spliced together using 
stepwise asymmetric PCR (Sandhu et ai 9 1992) and Splicing by Overlap Extension 
(SOEing) (Figure 17a). Each subcassette was then PCR amplified, cloned into 

15 pBluescript™ II KS" using BamHJJEcoKL and 16 individual clones sequenced. Mutations, 
deletions and insertions were present in the large majority of the clones for each 
subcassette, despite acrylamide gel purification of the long oligonucleotides. In order to 
construct a functional Savine with minimal mutations, two clones for each subcassette with 
no insertions or deletions and hence a complete open reading frame and with minima] 

20 numbers of non-designed mutations, were selected from the sixteen available. The 
subcassettes were then excised from their plasmids and joined by stepwise PCR-amplified 
ligation using the polymerase blend Elongase™ (Life Technology), T4 DNA ligase and the 
cohesive restriction enzymes XbaVSpeVAvrWNhe\ to generate two copies of cassettes A, 
B and C as outlined in Figure 14 and shown in Figure 17b. Predicted sequences for these 

25 cassettes are shown in Figure 30. Each cassette was then reamplified by PCR with 
Elongase™, cloned into pBluescript™ II KS~ and 3 of the resulting plasmid clones 
sequenced using 12 of the 36 sequencing primers designed to cover the full length 
construct Clones with minimal or no further mutations were selected for transfer into 
plasmids for DNA vaccination or used to make recombinant poxviruses. A summary of the 

30 number of designed and non-designed mutations in each Savine construct is presented in 
Table 1. 
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TABLE 1 

Summary of mutations 
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Cassette A 


1896 


249 


124 


107 


5 (AC1), 8 (AC2) 


Cassette B 


1184 


260 


130 


124 


11(BC1),4(BC2) 


Cassette C 


1969 


276 


138 


121 


10(CC1), 14(CC2) 


Full length 


5742 


785 


392 


352 


26(FL1), 26(FL2) 



Summary of the mutations present in the two full-length clones constructed as determined by 
5 sequencing. Includes the number of mutations designed, expected and actually present in die 2 clones and the 
number of non-designed mutations in each cassette and full-length clone. 



HIVSavineDNA vaccines and Recombinant Vaccinia viruses 

To test the immunological effectiveness of the HIV Savine constructs the cassette 
sequences were transferred into DNA vaccine and poxvirus vectors. These vectors when 
10 used either separately in immunological assays described below or together in a ^rime- 
boost* protocol which has been shown previously to generate strong T cell responses in 
vivo (Kent etaL 9 1997). 

DNA Vaccination plasmids were constructed by excising the cassettes from the 
selected plasmid clones with XbaVXhol (cassette A) or XbaVSaTl (cassettes B and C) and 

1 5 ligating them into pDNAVacc cut with XbaVXhol to create pDVACl , pDVAC2, pDVBCl , 
pDVBC2, pDVCCl, pDVCC2, respectively (Figure 18a). These plasmids were then 
further modified by cloning into their Xbal site a DNA fragment excised using XbaVAvrB. 
from pTUMERA2 and encoding a synthetic endoplasmic reticulum (ER) signal sequence 
from the Adenovirus El A protein (Persson et aL, 1980) (Figure 18a). ER signal sequences 

20 have been shown previously to enhance the presentation of both CTL and T helper 
epitopes in vivo (Ishioka, G.Y., 1999; Thomson et al. 9 1998). The plasmids pDVERACl, 
pDVERBCl, pDVERCCl andpDVERAC2, pDVERBC2, pDVERCC2 were then mixed 
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together to create, plasmid pool 1 and pool 2 respectively. Each plasmid pool collectively 
encodes one copy of the designed full-length HIV Savine. 

Plasmids to generate recombinant Vaccinia viruses which express HIV Savine 
sequences were constructed by excising the various HIV Savine cassettes from the selected 
5 plasmid clones using BamYWXhol (cassette A) or BamHUSall (cassettes B and C) and 
cloned into the marker rescue plasmid, pTK7.5, cleaved with BamHUSall. These pTK7.5- 
derived plasmids were then used to generate recombinant Vaccinia viruses by marker 
rescue recombination using established protocols (Boyle et al. y 1985) to generate W-AC1, 
W-AC2, W-BC1, W-BC2, W-CC1 and W-CC2 (Figure 18b). 

10 Two further DNA vaccine plasmids were constructed each encoding a version of 

the full length HIV Savine (Figure 18c). Briefly, the two versions of cassette B were 
excised with Xhol and cloned into the corresponding selected plasmid clones containing 
cassette A sequences that were cut with XhoVSaR to generate pBSABl and pBSAB2 
respectively. The joined A/B cassettes in pBSABl and pBSAB2 were excised with 

15 XbaVXhol and cloned into pDVCCl and pDVCC2, respectively, and cleaved with 
XbaVXhol to generate pDVFLl and pDVFL2. These were then further modified to contain 
an ER signal sequence using the same cloning strategy as outlined in figure 1 8 a. 

Restimulation of HIV specific lymphocytes from HIV infected patients 

The present inventors examined the capacity of the HIV Savine to restimulate 

20 HTV-specific polyclonal CTL responses from HIV-infected patients. PBMCs from three 
different patients were restimulated in vitro with two HIV Savine Vaccinia virus pools 
(Pool 1 included W-AC1 andW-BCl; Pool 2 included W-AC2, W-BC2 and W-CC2) 
then used in CTL lysis assays against LCLs infected either with one of the Savine Vaccinia 
virus pools or Vaccinia viruses which express gag, env or pol. Figure 19 clearly shows, 

25 that in all three assays, both HIV Savine viral pools restimulated HTV-specific CTL 
responses which could recognise targets expressing whole natural HIV antigens and not 
targets which were uninfected or infected with the control Vaccinia virus. Furthermore, in 
all three cases, both pools restimulated responses that recognised all three natural HIV 
antigens. This result suggests that the combined Savine constructs will provide broader 

30 immunological coverage than single antigen based vaccine approaches. The level of lysis 
in each case of targets infected with Savine viral pools was significantly higher than the 
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lysis recorded for any other infected target. This probably reflects the combined CTL 
responses to gag, pol, and env plus other HIV antigens not analysed here but whose 
sequences are also incorporated into the Savine constructs. 

CTL recognition of each HIV antigen is largely controlled by each patient's HLA 
5 background hence the pattern of CTL lysis for whole HTV antigens is different in each 
patient. Interestingly, this CTL lysis pattern did not change when the second Savine 
Vaccinia virus pool was used for CTL restimulation. In these assays, therefore, the 
inventors were unable to demonstrate clear differences between pools 1 and 2, despite pool 
1 lacking a Vaccinia virus expressing cassette CC1 and despite the many amino acid 
1 0 differences between the A and B cassettes in each pool (see table 1). 

From the foregoing, the present inventors have developed a novel 
vaccine/therapeutic strategy. In one embodiment, pathogen or cancer protein sequences are 
systemically fragmented, reverse translated back into DNA, rearranged randomly then 
joined back together. The designed synthetic DNA sequence is then constructed using long 

15 oligonucleotides and can be transferred into a range of delivery vectors. The vaccine 
vectors used here were DNA vaccine plasmids and recombinant poxvirus vectors which 
have been previously shown to elicit strong T cell responses when used together in a 
'prime-boost* protocol (Kent et a/., 1997). An important advantage of scrambled antigen 
vaccines or 'Savines* is that the amount of starting sequence information for the design can 

20 be easily expanded to include the majority of the protein sequences from a pathogen or for 
cancer, thereby providing the maximum possible vaccine or therapy coverage for a given 
population. 

An embodiment of the systematic fragmentation approach described herein was 
based on the size and processing requirements for T cell epitopes and was designed to 
25 cause maximal disruption to the structure and function of protein sequences. This 
fragmentation approach ensures that the maximum possible range of T cell epitopes will be 
present from any incorporated protein sequence without the protein being functional and 
able to compromise vaccine safety 

Another important advantage of Savines is that consensus protein sequences can 
30 be used for their design. This feature is only applicable when the design needs to cater for 
pathogen or cancer antigens whose sequence varies considerably. HIV is a highly 
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mutagenic virus, hence this feature was utilised extensively to design a vaccine which has 
the potential to cover not only field isolates of HIV but also the major HIV clades involved 
in the current HIV pandemic. To construct the HIV Savine, one set of long 
oligonucleotides was synthesised, which included degenerate bases in such a way that 8 
5 constructs are theoretically required for the vaccine to contain all combinations in any 
stretch of 9 amino acids. The inventors believe that this approach can be improved for the 
following reasons: 1) While degenerate bases should be theoretically equally represented, 
in practice some degenerate bases were biased towards one base or the other, leading to a 
lower than expected frequency of the designed mutations in the two full length HIV 
10 Savines which were constructed (see Table 1). 2) Only sequence combinations actually 
present in the HIV clade consensus sequences are required to get full clade coverage, 
hence the number of full length constructs needed could be reduced. To reduce the number 
of constructs however, separate sets of long oligonucleotides would have to be synthesised, 
significantly increasing the cost, time and effort required to generate a vaccine capable of 
1 5 such considerable vaccine coverage. 

A significant problem during the construction of the HIV Savine synthetic DNA 
sequence was the incorporation of non-designed mutations. The most serious types of 
mutations were insertions, deletions or those giving rise to stop codons, all of which 
change the frame of the synthesised sequences and/or caused premature truncation of the 

20 Savine proteins. These types of mutation were removed during construction of the HIV 
Savines by sequencing multiple clones after subcassette and cassette construction and 
selecting functional clones. The major source of these non-designed mutations was in the 
long oligonucleotides used for Savine synthesis, despite their gel purification. This 
problem could be reduced by making the initial subcassettes smaller thereby reducing the 

25 possibility of corrupted oligonucleotides being incorporated into each subcassette clone. 
The second major cause of non-designed mutations was the large number of PCR cycles 
required for the PCR and ligation-mediated joining of the subcassettes. Including extra 
sequencing and clone selection steps during the subcassette joining process should help to 
reduce the frequency of non-designed mutations in fixture constructs. Finally, another 

30 method that could help reduce the frequency of such mutations at all stages is to use 
resolvase treatment Resolvases are bacteriophage-encoded endonucleases which recognise 
disruptions to double stranded DNA and are primarily used by bacteriophages to resolve 
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Holliday junctions (Mizuuchi, 1982; Youil et al. 9 1995). T7 endonuclease I has already 
been used by the present inventors in synthetic DNA constructions to recognise mutations 
and cleave corrupted dsDNA to allow gel purification of correct sequences. Cleavage of 
corrupted sequences occurs because after a simple denaturing and hybridisation step 
5 mutated DNA hybridises to correct DNA sequences and results in a mispairing of DNA 
bases which is able to be recognised by the resolvase. This method resulted in a 50% 
reduction in the frequency of errors. Further optimisation of this method and the use of a 
thermostable version of this type of enzyme could further reduce the frequency of errors 
during long Savine construction. 

10 Two pools of Vaccinia viruses expressing Savine cassettes were both shown to 

restimulate HIV-specific responses from three different patients infected with B clade HIV 
viruses. These results provide a clear indication that the HIV Savine should provide broad 
coverage of the population because each patient had a different HLA pattern yet both pools 
were able to restimulate HIV-specific CTL responses in all three patients against all three 

15 natural HIV proteins tested. Also, both pools were shown to restimulate virtually identical 
CTL patterns in all three patients. This result was unexpected because some responses 
should have been lost or gained due to the amino acid differences between the two pools 
and because Pool 1 is only capable of expressing 2/3 of the full length HIV Savine. There 
are two suggested reasons why the pattern of CTL lysis was not altered between the two 

20 viral pools. Firstly, the sequences in the Savine constructs are nearly all duplicated because 
the fragment sequences overlap. Hence the loss of a third of the Savine may not have 
excluded sufficient T cell epitopes for differences to be detected in only three patient 
samples against only three HIV proteins. Secondly, while mutations often destroy T cell 
epitopes, if they remain functional, then the CTL they generate frequently can recognise 

25 alternate epitope sequences. Taken together this finding indirectly suggests that combining 
only two Savine constructs may provide robust multiclade coverage. Further experiments 
are being carried out to directly examine the capacity of the HIV Savine to stimulate CTL 
generated by different strains of HIV vims. The capacity of the two HIV-1 Savine 
Vaccinia vector pools to stimulate CD4+ T cell HIV-1 specific responses from infected 

30 patients was also tested (Figure 20). Both patients showed significant proliferation of 
CD4+ T cells although both pools did not show consistent patterns suggesting that the two 
pools may provide wider vaccine coverage than using either pool independently. 
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The present inventors have generated a novel vaccine strategy, which has been 
used to generate what the inventors believe to be the most effective HIV candidate vaccine 
to date. The inventors have used this vaccine to immunise naive mice. Figure 21 shows 
conclusively that the HIV-1 Savine described above can generate a Gag and Nef CTL 
5 response in naive mice. It should be noted, however, that the Nef CTL epitope appeared to 
exist only in Pool 1 since it was not restimulated by Pool 2. This is further proof of the 
utility of combining HIV-1 Savine Pool 1 and Pool 2 components together to provide 
broader vaccine coverage. 

The HIV-1 Savine Vaccinia vectors have also been used to restimulate in vivo 
10 HIV-1 responses in pre-immune M nemestrina monkeys. These experiments (Figure 22) 
showed, by INF-y ELISPOT and CD69 expression on both CD4 and CD8 T cells, that the 
ability of the HIV-1 SAVINE to restimulate HIV-1 specific responses in vivo is equivalent 
or perhaps better than another HIV-1 candidate vaccine. 

This is a generic strategy able to be applied to many other human infections or 
15 cancers where T-cell responses are considered to be important for protection or recovery. 
With this in mind the inventors have begun constructing Savines for melanoma, cervical 
cancer and Hepatitis C. In the case of melanoma, the majority of the currently identified 
melanoma antigens have been divided into two groups, one containing antigens associated 
with melanoma and one containing differentiation antigens from melanocytes, which are 
20 often unregulated in melanomas. Two Savine constructs are presently being constructed to 
cater for these two groups. The reason for making the distinction is that treatment of 
melanoma might first proceed using the Savine that incorporates fragments of melanoma 
specific antigens only. If this Savine fails to control some metastases then the less specific 
Savine containing the melanocyte-specific antigens can then be used. It is important to 
25 point out that other cancers also express many of the antigens specific to melanomas e.g. y 
testicular and breast cancers. Hence the melanoma specific Savine may have therapeutic 
benefits for other cancers. 

A small Savine is also being constructed for cervical cancer. This Savine will 
contain two antigens, E6 and E7, from two strains of human papilloma virus (HPV), HPV- 
30 16 and HPV- 18, directly linked with causing the majority of cervical cancers worldwide. 
There is a large number of sequence differences in these two antigens between the two 
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strains which would normally require two Savines to be constructed. However since this 
Savine is small, the antigen fragments from both strains are being scrambled together. 
While it is normally better for the Savine approach to include all or a majority of the 
antigens from a virus, in this case only E6 and E7 are expressed during viral latency or in 
5 cervical carcinomas. Hence in the interests of simplicity, the rest of the HPV genome will 
not be included although all HPV antigens would be desirable in a Savine against genital 
warts. 

Two Savines have also been constructed for two strains of hepatitis C, a major 
cause of liver disease in the world. Hepatitis C is similar to HIV in the requirements for a 

10 vaccine or therapeutic. However, the major hepatitis C strains share significantly lower 
homology, 69-79%, with one another than do the various HIV clades. To cater for this the 
inventors have decided to construct two separate constructs to cater for the two major 
strains present in Australia, types laand 3a, which together cause approximately 80-95% of 
hepatitis C infections in this country. Both constructs will be approximately the same size 

15 as the HIV Savine but will be blended together into a single vaccine or therapy. 

Overall it is believed that the Savine vaccine strategy is a generic technology 
likely to be applied to a wide range of human diseases. It is also believed that because it is 
not necessary to characterise each antigen, this technology will be actively applied to 
animal vaccines as well where research into vaccines or therapies is often inhibited by the 
20 lack of specific reagents, modest research budgets and poor returns on animal vaccines. 

EXAMPLE 2 
Hepatitis C Savine 

Synthetic immunomodulatory molecules have also been designed for treating 
Hepatitis C. In one example, the algorithm of Figure 25 was applied to a consensus 

25 polyprotein sequence of Hepatitis C la to facilitate its segmentation into overlapping 
segments (30 aa segments overlapping by 15 aa), the rearrangement of these segments into 
a scrambled order and the output of Savine nucleic acid and amino acid sequences, as 
shown in Figure 26. Exemplary DNA cassettes (A, B and C) are also shown in Figure 26, 
which contain suitable restriction enzyme sites at their ends to facilitate their joining into a 

30 single expressible open reading frame. 
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EXAMPLE 3 
Melanoma Savine 

The algorithm of Figure 25 was also applied to melanocyte differentiation 
antigens (gplOO, MART, TRP-1, Tyros, Trp-2, MC1R, MUC1F and MUC1R) and to 
5 melanoma specific antigens (BAGE, GAGE-1, gpl00In4, MAGE-1, MAGE-3, PRAME, 
TRP2IN2, NYNSOla, NYNSOlb and LAGE1), as shown in Figure 27, to provide separate 
Savine nucleic acid and amino acid sequences for treating or preventing melanoma. 

EXAMPLE 4 

Resolvase Repair Experiment 

10 A resolvase can be used advantageously to repair errors in polynucleotides. The 

following procedure outlines resolvase repair of a synthetic 340 bp fragment in which 
DNA errors were common. 

Method 

The 340 bp fragment was PCR amplified and gel purified on a 4% agarose gel. 
15 After spin purifying, lOul of the eluate corresponding to approximately 100 ng was 
subjected to the resolvase repair treatment. The rest of the DNA sample was stored for later 
cloning as the untreated control. 

2 fiL of IQxPCR buffer, 2 /iL of 20 mM MgCl 2 and 6 nL of MilliQ™ wate r 
(MQW) and Taq DNA polymerase were added to the 10 /iL DNA sample. The mixture 

20 was subjected to the following thermal profile; 95°C for 5min, 65°C for 30min, cooled and 
held at 37°C. Five /iL of 10xT7 endonuclease I buffer, 8 /iL of 1/50 fiL of T7endoI enzyme 
stock and 17 jiL of MQW were added, mixed and incubated for 30 min. Loading buffer 
was added to the sample and the sample was electrophoresed on a 4% agarose gel. A faint 
band corresponding to the full length fragment was excised and subjected to 15 further 

25 cycles of PCR. The amplified fragment was agarose gel purified and, along with the 
untreated DNA sample, cloned into pBluescript. Eleven plasmid clones for each DNA 
sample were sequenced and the number and type of errors compared (see table) 
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Buffers were as follows: 

IQx T7endonuclease buffer 

2.5ml 1M TRIS pH7.8, 0.5ml 1M MgCl 2 , 25 /iL 1 M DTT, 50 /iL lOmg/mL BSA, 
2 mLMQW made up to a total of 5 mL. 

5 T7 endonuclease I stock 

Concentrated sample of enzyme prepared by, and obtained from, JeffBabon (St 
Vincent's Hospital) was diluted 1/50 using the following dilution buffer: 50 fit 1 M TRIS 
pH7.8, O.ljiL 1M EDTA pH8, 5 /iL 100 mM glutathione, 50 fiL lOmg/mL BSA, 2.3 mL 
MQW, 2.5 mL glycerol made up to a total of 5 mL. 

10 Results 

The results are summarised in Tables 2 and 3. 



TABLE 2 



1 " TV^-d" 










-f — — " — — -■ - --; 


A/T to G/C = 6 


A/TtoG/C=l 


G/CtoA/T=12 


G/CtoA/T = 7 


A/T to deletion = 1 


A/T to deletion = 1 


G/C to deletion ■ 6 


G/C to deletion = 3 


TABLE 3 




! 1 


6/1 1 contained deletions 


3/1 1 contained deletions 


9/1 1 contained mutations 


7/1 1 contained mutations 
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i 


2/1 1 correct 


3/11 correct 



Discussion/Conclusion 



While overall the number of correct clones obtained was not significantly 
different, there was a significant difference in the level of errors. This reduction in errors 
becomes more significant as greater numbers of long oligonucleotides are joined into the 
5 one construct i.e., increasing the difference between untreated versus treated samples in the 
chance of obtaining a correct clone. It is believed that combining another resolvase such as 
T4 endonuclease VII may further enhance repair or increase the bias against errors. - 

Importantly, this experiment was not optimised e.g., by using proofreading PGR 
enzymes or optimised conditions. Finally if the repair reaction is carried out during norma] 

10 PCR, for example, by including a thermostable resolvase, it is believed that amplification 
of already damaged long oligonucleotides, and the normal accumulation of PCR induced 
errors, even using error reading polymerases during PCR, could be reduced significantly. 
The repair of damaged long oligonucleotides is particularly important for synthesis of long 
DNA fragment such as in Savines because, while the rate of long oligonucleotide damage 

15 is typically <5%, after joining 10 oligonucleotides, the error rate approaches 50%. This is 
true even using the best proofreading PCR enzymes because these enzymes do not verify 
the sequence integrity using correct oligonucleotide templates that exist as a significant 
majority (95%) in a joining reaction. 



20 The disclosure of every patent, patent application, and publication cited herein is 

incorporated herein by reference in its entirety. 

The citation of any reference herein should not be construed as an admission that 
such reference is available as 'Trior Arf * to the instant application 

Throughout the specification the aim has been to describe the preferred 
25 embodiments of the invention without limiting the invention to any one embodiment or 
specific collection of features. Those of skill in the art will therefore appreciate that, in 
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light of the instant disclosure, various modifications and changes can be made in the 
particular embodiments exemplified without departing from the scope of the present 
invention. All such modifications and changes are intended to be included within the 
scope of the appended claims. 
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WHAT IS CLAIMED IS: 

1. A synthetic polypeptide comprising a plurality of different segments of at least one 
parent polypeptide, wherein the segments are linked together in a different relationship 
relative to their linkage in the at least one parent polypeptide to impede, abrogate or 
otherwise alter at least one function associated with the parent polypeptide. 

2. The synthetic polypeptide of claim 1, consisting essentially of different segments of a 
single parent polypeptide. 

3. The synthetic polypeptide of claim 1, consisting essentially of different segments of a 
plurality of different parent polypeptides. 

4. The synthetic polypeptide of claim 1, wherein the segments in said synthetic 
polypeptide are linked sequentially in a different order or arrangement relative to their 
linkage in said at least one parent polypeptide. 

5. The synthetic polypeptide of claim 4, wherein the segments in said synthetic 
polypeptide are randomly rearranged relative to their order or arrangement in said at least 
one parent polypeptide. 

6. The synthetic polypeptide of claim 1, wherein the size of an individual segment is at 
least 4 amino acids. 

7. The synthetic polypeptide of claim 6, wherein the size of an individual segment is from 
about 20 to about 60 amino acids. 

8. The synthetic polypeptide of claim 7, wherein the size of an individual segment is 
about 30 amino acids. 

9. The synthetic polypeptide of claim 7, comprising at least 30% of the parent polypeptide 
sequence. 

10. The synthetic polypeptide of claim 1, wherein at least one of said segments comprises 
partial sequence identity or homology to one or more other said segments. 

11. The synthetic polypeptide of claim 10, wherein the sequence identity or homology is 
contained at one or both ends of an individual segment 
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12. The synthetic polypeptide of claim 11, wherein one or both ends of said segment 
comprises at least 4 contiguous amino acids that are identical to, or homologous with, an 
amino acid sequence contained within one or more other of said segments. 

13. The synthetic polypeptide of claim 10, wherein the size of an individual segment is 
about twice the size of the sequence that is identical or homologous to the or each other 
said segment. 

14. The synthetic polypeptide of claim 13, wherein the size of an individual segment is 
about 30 amino acids and the size of the sequence that is identical or homologous to the or 
each other said segment is about 15 amino acids. 

15. The synthetic polypeptide of claim 1, wherein an optional spacer is interposed between 
some or all of the segments. 

16. The synthetic polypeptide of claim 15, wherein the spacer alters proteolytic processing 
and/or presentation of adjacent segment(s). 

17. The synthetic polypeptide of claim 16, wherein the spacer comprises at least one 
neutral amino acid. 

18. The synthetic polypeptide of claim 16, wherein the spacer comprises at least one 
alanine residue. 

19. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is 
associated with a disease or condition. 

20. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is 
selected from a polypeptide of a pathogenic organism, a cancer-associated polypeptide, an 
autoimmune disease-associated polypeptide, an allergy-associated polypeptide or a variant 
or derivative of these. 

21. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is a 
polypeptide of a virus. 

22. The synthetic polypeptide of claim 21, wherein the virus is selected from a Human 
Immunodeficiency Virus (HIV) or a Hepatitis virus. 

23. The synthetic polypeptide of claim 22, wherein the virus is a Human 
Immunodeficiency Virus (HIV) and the at least one parent polypeptide is selected from 
env, gag, pol, vif, vpr, tat, rev, vpu and nef, or a combination thereof. 
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24. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is a 
cancer-associated polypeptide. 

25. The synthetic polypeptide of claim 24, wherein the cancer is melanoma. 

26. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanocyte differentiation antigen. 

27. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanocyte differentiation antigen selected from gplOO, MART, TRP-1, Tyros, TRP2, 
MC1R, MUC1F, MUCIR or a combination thereof. 

28. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanoma-specific antigen. 

29. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanoma-specific antigen selected from BAGE, GAGE-1, gpl00In4, MAGE-1, MAGE- 
3, PRAME, TRP2IN2, NYNSOla, NYNSOlb, LAGE1 or a combination thereof. 

30. A synthetic polynucleotide encoding a synthetic polypeptide comprising a plurality of 
different segments of at least one parent polypeptide, wherein the segments are linked 
together in a different relationship relative to their linkage in the at least one parent 
polypeptide to impede, abrogate or otherwise alter at least one function associated with the 
parent polypeptide. 

31. A method for producing the synthetic polynucleotide encoding a synthetic polypeptide 
comprising a plurality of different segments of at least one parent polypeptide, wherein the 
segments are linked together in a different relationship relative to their linkage in the at 
least one parent polypeptide to impede, abrogate or otherwise alter at least one function 
associated with the parent polypeptide, said method comprising: 

- linking together in the same reading frame a plurality of nucleic acid sequences 
encoding different segments of the at least one parent polypeptide to form a synthetic 
polynucleotide whose sequence encodes said segments linked together in a different 
relationship relative to their linkage in the at least one parent polypeptide. 

32. The method of claim 31, further comprising fragmenting the sequence of a respective 
parent polypeptide into fragments and linking said fragments together in a different 
relationship relative to their linkage in a respective parent polypeptide sequence. 
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33. The method of claim 32, wherein the fragments are randomly linked together. 

34. The method of claim 31, farther comprising reverse translating the sequence of a 
respective parent polypeptide or a segment thereof to provide a nucleic acid sequence 
encoding said parent polypeptide or said segment. 

35. The method of claim 34, wherein an amino acid of a respective parent polypeptide 
sequence is reverse translated to provide a codon, which has higher translational efficiency 
than other synonymous codons in a cell of interest. 

36. The method of claim 35, wherein an amino acid of said parent polypeptide sequence is 
reverse translated to provide a codon which, in the context of adjacent or local sequence 
elements, has a lower propensity of forming an undesirable sequence that is refractory to 
the execution of a task. 

37. The method of claim 35, wherein an amino acid of said parent polypeptide sequence is 
reverse translated to provide a codon which, in the context of adjacent or local sequence 
elements, has a lower propensity of forming an undesirable sequence selected from a 
palindromic sequence or a duplicated sequence, which is refractory to the execution of a 
task selected from cloning or sequencing. 

38. The method of claim 31, further comprising linking a spacer oligonucleotide encoding 
at least one spacer residue between segment-encoding nucleic acids. 

39. The method of claim 38, wherein spacer oligonucleotide encodes 2 to 3 spacer 
residues. 

40. The method of claim 38 or claim 39, wherein the spacer residue is a neutral amino acid. 

41 . The method of claim 38 or claim 39, wherein the spacer residue is alanine. 

42. The method of claim 31, further comprising linking in the same reading frame as other 
segment-containing nucleic acid sequences at least one variant nucleic acid sequence 
which encodes a variant segment having a homologous but not identical amino acid 
sequence relative to other encoded segments. 
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43. The method of claim 42, wherein the variant segment comprises conserved and/or non- 
conserved amino acid differences relative to one or more other encoded segments. 

44. The method of claim 43, wherein the differences correspond to sequence 
polymorphisms. 

45. The method of claim 44, wherein degenerate bases are designed or built in to the at 
least one variant nucleic acid sequence to give rise to all desired homologous sequences. 

46. The method of claim 31, further comprising optimising the codon composition of the 
synthetic polynucleotide such that it is translated efficiently by a host cell. 

47. A synthetic construct comprising a synthetic polynucleotide encoding a synthetic 
polypeptide comprising a plurality of different segments of at least one parent polypeptide, 
wherein the segments are linked together in a different relationship relative to their linkage 
in the at least one parent polypeptide to impede, abrogate or otherwise alter at least one 
function associated with the parent polypeptide, wherein said synthetic polynucleotide is 
operably linked to a regulatory polynucleotide. 

48. The synthetic construct of claim 47, further including a nucleic acid sequence encoding 
an immunostimulatory molecule. 

49. The synthetic construct of claim 48, wherein the immunostimulatory molecule 
comprises a domain of an invasin protein (Inv). 

50. The synthetic construct of claim 48, wherein the immunostimulatory molecule 
comprises the sequence set forth in SEQ ID NO: 1467 or an immune stimulatory 
homologue thereof. 

51. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a T 
cell co-stimulatory molecule. 

52. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a T 
cell co-stimulatory molecule selected from a B7 molecule or an ICAM molecule. 

53. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a B7 
molecule or a biologically active fragment thereof, or a variant or derivative of these. 
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54. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a 
cytokine selected from an interleukin, a lymphokine, tumour necrosis factor or an 
interferon. 

55. The synthetic construct of claim 48, wherein the immuno stimulatory molecule is an 
immunomodulatory oligonucleotide. 

56. An immunopotentiating composition, comprising an immunopotentiating agent 
selected from the synthetic polypeptide of claim 1, the synthetic polynucleotide of claim 30 
or the synthetic construct of claim 47, together with a pharmaceutically acceptable carrier. 

57. The composition of claim 56, further comprising an adjuvant. 

58. A method for modulating an immune response, which response is preferably directed 
against a pathogen or a cancer, comprising administering to a patient in need of such 
treatment an effective amount of an immunopotentiating agent selected from the synthetic 
polypqrtide of claim 1, the synthetic polynucleotide of claim 30, the synthetic construct of 
claim 47, or the composition of claim 56. 

59. A method for treatment and/or prophylaxis of a disease or condition, comprising 
administering to a patient in need of such treatment an effective amount of an 
immunopotentiating agent selected from selected from the synthetic polypeptide of claim 
1, the synthetic polynucleotide of claim 30, the synthetic construct of claim 47, or the 
composition of claim 56. 

60. A computer program product for designing the sequence of a synthetic polypeptide 
comprising a plurality of different segments of at least one parent polypeptide, wherein the 
segments are linked together in a different relationship relative to their linkage in the at 
least one parent polypeptide to impede, abrogate or otherwise alter at least one function 
associated with the parent polypeptide, said program product comprising: 

- code that receives as input the sequence of said at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 

- code that links together said fragments in a different relationship relative to their 
linkage in said parent polypeptide sequence; and 
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- a computer readable medium that stores the codes. 

61. The computer program product of claim 60, further comprising code that randomly 
rearranges said fragments. 

62. The computer program product of claim 60, further comprising code that links the 
sequence of a spacer residue to the sequence of said at least one parent polypeptide or to 
said fragments. 

63. A computer program product for designing the sequence of a synthetic polynucleotide 
encoding a synthetic polypeptide comprising a plurality of different segments of at least 
one parent polypeptide, wherein the segments are linked together in a different relationship 
relative to their linkage in the at least one parent polypeptide to impede, abrogate or 
otherwise alter at least one function associated with the parent polypeptide, comprising: 

- code that receives as input the sequence of at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 

- code that reverse translates the sequence of a respective fragment to provide a 
nucleic acid sequence encoding said fragment; 

- code that links together in the same reading frame each said nucleic acid 
sequence to provide a polynucleotide sequence that codes for a polypeptide sequence in 
which said fragments are linked together in a different relationship relative to their 
linkage in the at least one parent polypeptide sequence; and 

- a computer readable medium that stores the codes. 

64. The computer program product of claim 63, further comprising code that randomly 
rearranges said nucleic acid sequences. 

65. The computer program product of claim 64, further comprising code that reverse 
translates an amino acid of a respective parent polypeptide sequence to provide a codon, 
which has higher translational efficiency than other synonymous codons in a cell of 
interest. 

66. The computer program product of claim 63, further comprising code that reverse 
translates an amino acid of a respective parent polypeptide sequence to provide a codon 
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which, in the context of adjacent or local sequence elements, has a lower propensity of 
forming an undesirable sequence that is refractory to the execution of a task. 

67. The computer program product of claim 63, further comprising code that links a spacer 
oligonucleotide to one or more of said nucleic acid sequences. 

68. A computer for designing the sequence of a synthetic polypeptide comprising a 
plurality of different segments of at least one parent polypeptide, wherein the segments are 
linked together in a different relationship relative to their linkage in the at least one parent 
polypeptide to impede, abrogate or otherwise alter at least one function associated with the 
parent polypeptide, wherein said computer comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 
synthetic polypeptide, sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polypeptide sequence. 

69. The computer of claim 68, wherein the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments and 
Unking together said fragments in a different relationship relative to their linkage in the 
sequence of said parent polypeptide. 

70. The computer of claim 68, wherein the processing of said machine readable data 
comprises randomly rearranging said fragments. 

71. The computer of claim 68, wherein the processing of said machine readable data 
comprises linking the sequence of a spacer residue to the sequence of said at least one 
parent polypeptide or to said fragments. 
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72. A computer for designing the sequence of a synthetic polynucleotide encoding a 
synthetic polypeptide comprising a plurality of different segments of at least one parent 
polypeptide, wherein the segments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide to impede, abrogate or otherwise alter at 
least one function associated with the parent polypeptide, wherein said computer 
comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 
synthetic polynucleotide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polynucleotide sequence. 

73. The computer of claim 72, wherein the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments, 
reverse translating the sequence of a respective fragment to provide a nucleic acid 
sequence encoding said fragment and linking together in the same reading frame each said 
nucleic acid sequence to provide a polynucleotide sequence that codes for a polypeptide 
sequence in which said fragments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide sequence. 

74. The computer of claim 72, wherein the processing of said machine readable data 
comprises randomly rearranging said nucleic acid sequences. 

75. The computer of claim 72, wherein the processing of said machine readable data 
comprises reverse translating an amino acid of a respective parent polypeptide sequence to 
provide a codon, which has higher translational efficiency than other synonymous codons 
in a cell of interest 
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76. The computer of claim 72, wherein the processing of said machine readable data 
comprises reverse translating an amino acid of a respective parent polypeptide sequence to 
provide a codon which, in the context of adjacent or local sequence elements, has a lower 
propensity of forming an undesirable sequence that is refractory to the execution of a task, 

77. The computer of claim 72, wherein the processing of said machine readable data 
comprises linking a spacer oligonucleotide to one or more of said nucleic acid sequences. 
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disulfide bonding \/ \/ 
| ) rev cfls. ->/<- nls ->/ 
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MUTATED AAs KK KT Y V T Y RRSE 
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MUTATED AAs D KIK SAR B.HSWNFP 
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Full length -17000 bp 



T 



Xbal*BamHI Sall/Xhol destroyed Xhol intact 

^ Cassette A -5600 bp Cassette B ~5600bp 

Xbal*BamHI Sail '| f Xhol . N Xhol 

BglllEcoRIXhol* Xbal*BamHI ^glllEcoRISall* 



BglllEcoRISair 
Cassette C ~5800bp 



"Xhol 
Xbal*BamHI 



] 

BglllEcoRISair 



Full length construction after cloning the cassettes into pBS 
Sites marked with a — are in the pBS MCS 



Cassette Extras (Can be removed from cassette ends) 

A (37bp) BamHI/Kozak Start Sail Stop Bglll EcoRI 

5' gc ggatccacc atg ....gtcgac tga agatct gaattc gc3* 

B (43bp) BamHI/Kozak Start Xhol Xhol Stop Bglll EcoRI 

5* gc ggatccacc atg ctcgag ctcgag tga agatgt gaattc gc3* 

C (37bp) BamHI/Kozak Start Xhol Stop Bglll EcoRI 

5' gc ggatccacc atg ctcgag... tga agatct gaattc gc3* 



FIGURE 14 
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Cassette Construction 



Full Length 5687bp 



Al-A4 3330bp 



A5-A8 2500bp 



Subcassettes 

A1/A21670bp 



Xbat. Spel 



A3/A4 1670bp 



A5/A6 1670bp 



Nhel Spel I Xbal Spel | Xbal Nhel 

Spel/Xbal Avrll/Nhel Avrll/Nhel 



A7 840bp, 



Subcassette Extras (Can be removed 
SCI (A 28bp, B/C 34bp) 

As for 5' of Cassettes 
SC2 (28bp) BamHI Xbal 

5" gc ggatcc tctaga 

SC3 -(28bp) . BamHI Spel 

5\ gc ggatcc actagt 

SC4 ( 2 8bp ) BamHI Nhel 

5 ■ gc ggatcc gctagc 

SC5 (28bp) BamHI Spel 

5 * gc ggatcc actagt. ...... 

SC6 (28bp) BamHI Nhel 

5* gc ggatcc gctagc 

For Cassettes A and B only 
SC7 (37bp) BamHI Nhel 

5 ' gc ggatcc gctagc. 
For Cassette C only 
SC7 ( 2 8bp ) BamHI Nhel 

5 1 gc ggatcc gctagc 

SC8 (31bp) BamHI Xbal 

5 ' gc ggatcc tctaga 



from cassette ends) 

Spel 
actagt 

Nhel 
gctagc 

Avrll 
. . cctagg 

Xbal 
tctaga 

Avrll 
ccatgg 

Xbal 
tctaga 



EcoRI 

gaattc gc 3 ■ 
EcoRI 

gaattc gc 3* 
EcoRI 

gaattc gc 3 ' 
EcoRI 

gaattc gc 3 ' 
EcoRI 

gaattc gc 3 1 
EcoRI 

gaattc gc 3 • 



As for 3 * of Cassettes A/B 

Spel EcoRI 
actagt gaattc gc 3 V 



As for 3 ' of Cassette C 



FIGURE 14 (Cont) 



WO 01/090197 



PCT/AU01/00622 



62/216 




SUBSTITUTE SHEET (RULE 26) 
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Kozac 



BamHJ 



Start 



' I , | . h • : env 185-214(149) - : 

(^GGATdCAC^TdACAGGCCCT^ 

C cdcCTAOGTGOT AOTGTC CGGG AA CGTKTTTGCAGTCGWGGC ACGTTACGTGTGTGCCTTAGTyTGGGCA GC AC AGGTG 
1 ' MTGPCXNVSXVQCTHGIXPVVST 




120 



130 



gag 76-105 (6) 



T> 
160 



R AAGC CTCl'WCAATACCHTCGCCAC ACTGTGGTGCGTCCACCAAAGGATTG A SG 
AWTTATGGYAGCGGTGTGACACCACGCAGGTGGTTTCCTAACTSC 
LLLNGSL'x S L X N T X A T LWCVHQ R I X> 



190 



200 



210 



220 p 0 | 31-60(36) 



TCARGGAOXOVAAGGAAGCCCTCGACAAAATCGA^CTCGGCG^ 

AGTYCCTGTCTTTCCTTCGGCAGCTGTTTTAGCTjGAGCCGCT ACCGCCTCCGCGACTWTtXGTTCCGTGGAGGTCGAGG 
VX DTKEALDK I B ' L GDGGGAXRQ GT S SS> 



250 



260 



270 



280 



290 



300 



310 



320 



YTC A R CTTTCC AC AAATCAC ACTGTGGCAAAGGCCTC1 
RACTYGAAAC 

XXPPQITLWQ R P L V T ' 



.TCCCGAWATGGTGAT 
TCTlW rK TCTTAGGGCTWTACCACTA 
PFRXXNPXMVI> 



pol 316-345 (55) *5o 



360 



370 



360 



390 



400 



TTACCAGTACATGGACGATCTGTATGTGGGAAGCGATCTGGAA 
AATGGTCATGTACCTGCTAGACATACACCCTTCGCT'AGACCrTrAGCCTG^ 

YQYMDDLYVG SDLBIGQH'FTTPDK KH> 



"° pol 361-390 (58) "° 



450 



460 



470 



480 



AAAAGGAACCACCATTCCTCTGGATGGGATACGAACTGC 
TTTTCCTTGGTCGTAAGGAGACCTA«:CTAl«rrTGACCT 

0 K BP P FLWHGY E LH PDRWTVQP'XX F P Q> 



490 



500 



pol 46-75 (37) 530 



540 



550 



ATT AC CCTCTGGC AGCCTCCCCTCGTC iA CARTCAAAATCGCCGGACAGCTCA 
TAATCGGAGACCGTCC C AGOGGAGCACTGTY AGTTTTAGCCGCCTX?TCGAiOTWTCTCCGAGACGAG C TCTGTCC 
I TLWQRPLVTX K ICGQLXEA L L DT G 1 



570 



580 



530 tat 46-75 (121) 



620 



630 



560 



A 

iGGRT 
S X> 

640 



TCGCAGAAAGAAACGTA6GCAACCTAGASGCGCTCCTCAGAGCAG 

ACCGTCrrrt-'rrrGCAl t C GTrtC^TCTSCGCGAGGAGTCTCGTCK Y TCCTAGTGGTl^TGGGATAGRG^ACTCGTTGGGG 
G RRKRRQRRXAPQSXXDHOYPIXEQP> 



650 



660 



670 



680 




pol 1-30 (34) 



710 



720 



TCY 
AGRi 
L X 



AGGTRAAGCCAGAGAGTTTYCCAGCGAACAGACARGAGCCAATAGC 
ATCCCTTTrGC^ACCGAAAGGXCGTTCCAYTTCGGTCTC^ 
FFR E M L A FXQGXAREFXSEQTXAN S> 



730 



740 



750 



rev 106-122 (131) 

Y CC RC CTCCAGGAAC^AGCXXXXAAATCTCCGGCGAAA GCTC CGYC RTTCTGGGA' 
RGGY 



'GGAGGTCCTTQTCGGGGGTTTA 

xsrk'spqi 



•AGAGGCCGCTTTCGAGGCRGY 
S G E 



810 



820 



830 




gag 91-120(7) 



gAGAATCGAMGOt^AGATACCAAAGACC 

^ftTCTT ACCTWC ACTY TCT A 'l G&TTTC?ItXGAGA CCTATTCTAACTCCTCCWC G VI VI STT 1' I CG KTCGTTTTC TGTG'ITG 
' R IXVXDTKEAL D K I BEXQXKSXQKTQ> 



690 



900 



910 



920 



AGGCT 




pol 601-630 (74) 950 



960 



CCGGATACGTCACCCATAGGGGAAGGCAAAA^^ 
TCCGJVP^CJfaTTCGCtCTATGCAGTOQCTATCCC 

OA aa'kacyvtdrgrqkxxsltxttnqk> 
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970 980 990 iooo ioio env 46-75 (140) 1040 

ACCGAACTGCAWGCCATTCAMSAMGCCWn'ACCACACTGT^ 
TGGCTTGACGTWGGTAAGTXTTKCXX^ 
T E L X A I X X A X T T L PCAS DAKAXXT EV H> 

1050 1060 1070 1080 1090 pof 76-105 (39) 1120 

* * * * * * ' * 

CAATGTGTGGGCCAC AC ACGC^TTGCGTCCXt|cX^lX^CGATACAGTGCTCGAGG A S ATSAACCTCCCC GGAARATGG AAGC 
GTTACACACXCGGTGTGTCCGAACGCACGcdCGACTCCTATGTCACGACCr^^ 

n vwathacvp'addtvlexxnlpgxwk> 

1130 1140 1150 1160 1170 1180 1190 1200 

******* » 

ctaagatgattggcggaatcggcggattcattaacgtga^ 
ga1tctactaaccgccttagccgcctaagtaattccactcttyctagcct 

F I K V R ' X 




GCTATCAAGAAAAAGGACTCCACCAAATGCAGAAAGCTCGTGGATTTCAGfl 
CGATAG'l w rC M i*fTTTCCTGAGG'ilj^TPlACCTC^ 
- A I K KK DSTKWR K LV D F R ' X R I IXI L Y Q S> 

1290 rev 16-45 (125) 1320 1330 1340 1350 13S0 
* ******* 

CAATCCCTATCCTAGCTCCGAAGGCWCCAGGCAARCCAGAARGAATA^ 

N PYPSSEGXRQXR XNRRRRtt'GGEXXR> 

1370 1380 env 525-554 (171) 1410 1420 1430 1440 

ATACGTC CGTCAGACTGGTCA RCCCATT l. ' 1 * * AGCCCTCGCCTCGCACGATCTGAGAARCCTCTGC CTCTTQGAMAACCTC 
TATCCACGCACTCTGACCAGTYCCCTAAGARTCGGGAG^ 

DRS VRLVXGFXALA WDDLRXLCLF XN L> 

1450 1460 1470 env 31-60 (139) 1500 1S1 ? 152 ° 

TGGGTCACCGTCTACTATGGCGTCCCCGTCTGGAGAGASGCTRMCAC AACCCTCTTC TGTGCC TCCGACGCTAAGGCTY A 
ACCCAGTGGCAGATGATACCGCAGGGGCAGACCTCTCTSC^ 
WVTVYYGVPVWRXA X T T I# FCAS DA X A X> 

spacers 1550 isso wv ^ m ^0 (124) 1590 1600 



C GCTGCC ATGGCTGGCAGAAGCGGCRRCACAGACGAAGAGCTCCTGAR^ 

qcGACGqrAccGAcrorcrit:GCCGYYGTGTcrtx*r 

MAGRSGXTDEELLXAXRIIXXLYO 



1610 1620 1630 1640 1650 vif 16-45 (101) 1680 

* * * * * * 

CCAACCXrrTACCCTTCdg^ 

GGTTGGGAATGGGAAC4j|%%g^ join 
SNPYPSASMXIRTWXSLVKHHMXISKK> ^3 



1690 1700 1710 1720 1730 1740 1750 1760 

* * * * * * * * 

GCC AA WGGCUtyt/ri X. TATAGGC ATCACTKTGASI^GTOCGAGSTCGTt^RTCA GATT ATCGAA VAGCTC ATCAAAAAGGA 
CGGTTWC CGACCAAGATATCCGTACTG AWACTSpTC A GGCTC S AGC ACTY AGTCTAATAGCTreTCGAGTAGTTTTTCCT 
AXGWFYRHHXX ESE XVXQIIEXLI KKE> 

Dol 661-690 (78) 1790 1800 1810 1820 1830 1840 

AARGGTCT ACCTA KCATGGCT ACCAGCCCTlC AAGGGAATCG GflCAAAC^ YC AAAATCC 

TTYCCAGATGGATMGTACCCAITSGTC GG GTGTrC C C rT A GCCll GT Tr GC T^T C TCGA l^lXJ T l ' Jl CTCTAAK RGTTTTAGG 
XVY LXWVPAHKGIG QTK ELQXQIXK I> 



i°50 pot 91 6-945 (95) 1880 1890 1900 1910 1920 
* r * ' * * * * * 

AAAACTTTAGGGTCT ACT ATAGGGATAGCAGAG AC CCT*fTCTGGAA»GGGACC<j AAAAGC YTTGAGGAAATCTGGRAC AAT 
TTrTGAAATCCCAGATGATATCCCTATCGTCTtrrGGGAKA^ 

QMPRVYYRDSR DPXWKGp'ksXEE I W X W> 
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1930 env 405-434 (163) i960 1970 1980 1990 2000 
* * » • 1 * * 

ATGACATGGATKSAGTGGGAGAGAGAGATTAGCAATTACACAARCCWAAT^ 

TACTGTACC TAM STC ACCCTCTCTCTCTAATC GTTAATGTGTTY GGWTTAGATA YTCTTA AG ACrrY TGGGCTTGGGTGTCG 
MTWXXWEREISNYTXXlYXIlJxPEPTA> 

2010 2020 gag 451-480 (31) 2050 2060 2070 2000 
cccTcccGCTGAGARTTTCRGATTCOGTGAGGAAACTACAcccTcccMAAAGCAAGA 

GGGAGGGCGACTCTYAAAGYCTAAGCCAXTTCCTT^ 

ppaexfxfgeettpsxkqexkdke'qy> 

2090 2100 2110 DOl 106-135(41) 21" 2150 2160 

* ♦ * r * 7 # * 

A TC AGATTMTTATTG AGATTTGCGCCAAGAA AGCT ATTGGT AC AGTGCTCGTGGGACCTAC CC CTGTG AATATCATTGG C 
T AGTCTAAXAATAACTCTAAACCCL G'/l^I TTTL GATAAC CA TGTC ACGAGCACCCTGGATGGGGACACTTATAGT AACCG 
DQIXIEICCKKAICTVLVGPTPVNI IG> 



2170 2180 2190 2200 Vpr 46-75 (1 1 5) 2230 ' 2240 

AGHATITACGAAACCTATGGCGATACCTtK^AGG 
TCTJTAAATCCTTTGGATACCGCTATGCACC^^ 
R , I YETYGDTWEGVEALIRXLQQ LJIFXH> 

2250 2260 2270 2280 2290 tat 31-61 (1 20) 2320 

TTTCAGAATeX^GjJreTTWTttZATTG 

AAAGTCTTAGCCTOCAAWAGTAACGGTTSACACAAAAG^ 

FBI G'C XHCQXCFLTKGLG1SX G R K K R> 

2330 2340 spacers 2370 2380 tat 1-30 (118) 



RACAGAGAAGGSGAGCTCCCCAJ GCTGCC ^TGGACCC C GTGGIkCCCCAASCTSGAGCCTTSGIUfcliiCACCCTQG^TCCCAC 
YTOTCTCTTCC SC TCGAGCCCT1 CGACGC rACCTGCGGCACCTCGCCTTSGACCTCXXJAA 

X QRRXAPQAA M OPVO PXLE PWXH PG S Q> 



2410 2420 2430 2440 2450 2460 2470 2480 



CXTTAMGACAGCCTGT*OICAAATGCTATTGCAAAAAGTG^^^^^M GAAG AGACAACCCCTAGCCMGAAACA GGAACMGAA A3 
GGATKCTGTCGGACAWKGTrTACGA7AAC <yrrin ' J^ j 0 j n 

A4 



PXTACXKCYCKKCPSBETTPSXKQEXIO 

gag 466-495 (32) 251 ? 252 ? 253 ? 254 ? 255 ? 25e ? 

ACACAAAGAACWCraCCCCCCTTYAGCCAG^ 
TCTGTTTCTTGWGATGGGGGGAARTOJG^ 

D K E X Y PPXAS LK S LFGND'NFW M W K N X> 

2570 enV 91-120 (143) 26 °0 "10 2620 2630 2640 

* * fr • I * » 

TGGTGGAMCAGATGCAHGAAGACRTTATCTCACTATGGGACCAAAG^ 
ACCACCTKGTCTACGTKCTTCKrYAATAGAGTGATAC^ 

MVXQMXEOXI S LWDQSLKPCVK LDVGD> 

2650 2660 po 1 256-285 (51) 2690 2700 2710 2720 



GCCTATTTCTCCGTGCCTCTGGATRAARRCTTCAGAAAGTA^ 

CGGATAAAGAGGCACGGAGACCTAYTrfYGAAGTCITTC ATAT^ T Q GTT G A 
AYPSVPLDXX PRKYTAPTIPSXNWE'QL> 

2730 2740 2750 pol 751-780 (84) 2780 2790 2800 

GAAAGGCGAAGCCATSCATGGCCAAGTGRATTGCTCA 
CTTTCCGCrTCGGTASGTACCU^TrtJlCYTAACGAGT^ 

KGEAXHGQVXCS PGXWQ LDCTHLEGK> 

2810 2820 2830 2840 n 0 | 166-195 (45) 2870 2880 

TTATaCCTAAGGTCAAGCAATGGCCTCTGACAGAGGAAAAGATTA 
AATAGGGATTCCACT1CGTTA£CGGACACTGTCTCCT^ 

X I P K V K QWPLTEEK I KALTXIC X E M E X> 
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2890 2900 2910 ^,33^350(56) 2940 2950 2960 

GAGGGAAAGATTAGUATGGATG^ 
CTCCCTTTCTAATCUTACXTACTGG^^ 
EG RIs'mDD LYVGSDLEI G 0 H R X K I E E L> 

2970 2980 2990 3000 p(>| ^ ^ ^ 3030 3000 

CAGGSMACACCTCCTGARATGGGGflCTCACCGAJto 

gtccsktgtggaggactytacccctIgagtggct k tg g t g tt^ 

R X H L L X W G J L TXTTNQKTELXAIXLAL> 



3050 30 6 0 3070 3080 3090 po| 796 . 825 (87) 3 "? 

AAGACTCCGQCTYACAGCTCAACATTCTGACACftQ^ 
WrrGAGGCCGARTCTCCAGTTGTAACACTGTCTOT 

QDSGXBVNIVTD 1 ! PAETGQBTAYFXL K> 



3130 3140 1150 3160 3170 3180 3190 3200 

» * » * . * » • • 

CTGGCTGGCA6ATGGCCTGTGAHARYCATTCACACAGACAATCGc|a 
GACCGACCCTCTACXXGACACTKTyRCTAAGTGTC 
LAGRWPVXX"IHTDNG 1 RTX.IEELRXHLL> 

pol 346-375 (57) 3230 3240 3250 3260 3270 32 8p 

- * * * » * * 

CARATGGGCCTTCACAACCCCTGACAAAAAGCATCAGAAAGAGCCTC 

GTYTACCCCGAAGTGTTGGGGA C ' J " L» ' J * ri ' J " It_ GTAGTCTTTCTCGGAGGGAAAG A(^^^^^g CAGT^JXJTI'ft j ACTGTCTCC JC 
XWGFTTPDKKHQX EPPP.LSSVKKL.T E> 

3290 vif 166-192 (111) 332 » 333 » spacers 3360 
* ' * * ^ | -i * 



ATARGTGGAACRAACCCCAGAAAAYCAAGGGACRCACAGRAAATCACACAATGAATGG SCTGCC \CAGAGTCCCAG 
T ATYCACCTTGYTTGGGGTCTTTTRGTTC CC7TGYGTCTCYTTTAGTGTGTTACTTACCGGT7 CCACGC PGTCTCAGGGTC 
DXWNXPQKXKGXRXNHTMNGHAATES Q> 



3370 3380 en v 435-464 (165) 3410 3420 3430 



3440 



AATCAGCAA6ACAGAAACGAAMAGGAMCT GC T GC MGCTCGACAAATGGGCAAGC 
TTAGTCGTTCTGTCTTTtSCTTKTCCTKGACGACCXCGAGCT 
N Q QDRWEXXL L X LDKWAS L W N W FX I X 1 D> 

3450 3450 3470 gag 121-150 (9) 350 ° 351 ° J52 ° 

CACCGGAARTAGCTCCMAAGTGTCCCAGAAWACCCTATCGT^^ 
GTGQCCTTYATCGAG GKT TCACAG GGYC T T AATGGGATAGCAQGTC 

TGXSSXV50NYPIV0NXQGQIIVHQXX> 

3530 3540 3550 3560 env 480-509 (168) 3590 3600 

• * * * * - ' * * 

CCCCCAG f jcT CR TCGGACTGAGAATC Rl 1 1 1 CGC TGTGCTC AGCATTRTCAAT AGCGTC ACGCAAGGCT AT AGCCCTCTG 

gggggtctJgagyagcctgactcttagyaaaagcg^ 

S PR»LXGLRIXPAVLSIXHRVRQGYSPL> 

3610 3620 3630 3640 3650 Vlf 106-135 (107 3680 

TCCTTCCAAACCCTCMYOCTCATCCATCTGYA 
AGGAAGGTTTGGGAGKROGAGTAGGTAGACRTWATGAAACTGA 
SFQTLX , 1>IHLXYPDCFXDSXIRRAILG> 

3690 3700 3710 3720 3730. 3740 3750 3760 

» ♦ * * * » * # 

ACAS A XAGTGAGHAGGAGATGCGAATAQCCTGTGGGAK^ 

TGT STKlt! ACTCKTCCTCT ACGCTTATGC GAC ACC CTKAG G R AACCGAAAGACCCA CG GC GACCGAGGT 

X XVXRRCEY'AVGXGAMXXGPLGAAG S> 



env 300-329 (156) 



3790 3B00 3810 3820 3830 3840 



ccatgcccgctgcctccatsacactgacagtgcaagcqr 
ggtacccgcgacggaggtastgtgactgtcacgttcgdata 

tmgaasxtltvqa'ydps kdlxaeiqxq> 
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pol 466-495 (65) 



3880 



3890 



3900 



3910 



3920 



GGTCAGGRTCAGTGGACAT*rTCAGATTTWCCAAGA( 
CCAGTCC Y AGTCACCTGTAWAG7XTTAAAWGGTTCTO 
GQXOWTXQI XQBP 




3930 



pol 121-150 (42) 3960 



F K N 

3970 



CCGTC CTGGTC GGCCCT ACACCC GTC AA 
AGGACCAGCC GGGATGTGGGC AGTT 
GTVLVGPTPVN> 



3980 



3990 



4000 



C ATCATCGGAAGGAACMTGCTGACACAG3ITTGGCYGCAC CCTCAACTTTCCCATTAG<jAAAGGCAGCCCTGCTATCTTTC 
GTAGTAGCCTTCCTTGKACGACTGTGTCKAACCGRCtWGGXJAGT^ 

fpis'kgsp 



IGRHXLTQXGXTLN 



CGATAGAAAG 
A I F> 



4010 



402 J pol 301-330 (54) 405 ? 



4060 



4070 



4080 



A GTC C AGC ATGMC AMA GATTCTGGAUL' CJTf i 1 A GG A W AMAAAA C C CTGA S A TGGTC ATCT ATCAGTAT^^^^^Q C CTCTG 
TCAGGTCGTACXGTICTCTAAGACCTCGGAAAATCCT^ 

OS SMX XI LEPFR. XXNPXMVIYQYPs' PL> 



4090 



4100 



4110 nef 136-165 (188) 



4150 



4160 



ACATTCGGATGGTGTTK^AACTGGTCCCt^TGGACCCCAGSGA 
TGTAAGCCTACCAiCAAAGTrrGA<XAGGGGCACCTGGGGTC ^ 
TFGWCFK I, VPVDPXEVEBXNXGENNCL> 



4170 



4180 



4190 



4200 



CCTOTTT* 
GCAOAAA7 



pol 271-300 (52) 



4230 



4240 



PTTACCAAATACACAGCCTTTACCAlT^CCTCCAYCAATAACGAAACCCCT<XX!ATrAGGTATCAG^ATAACGTCC 
ATCCTITATGTGTCGGAAATGCTAAGGGAGGTRGT^ 
L'PRKYTA FT I P SX KNETPG IRYQYN V> 



4250 



4260 



4270 



4280 



TGCCTCAGGGATGOGGAAGCAG 
ACGGAGTCCCTACuCCTTCGTG* 
L P Q G W ' G S T 



4290 env 315-344 (157) «20 



Q 

4330 



lACAATGGGAGCCCCCAGCATXACCCTCACCGTCCAGGC^^ 
TCGTGTTACCCTCGCCGGTCGTAMTGGGAGTCGCAGGTCC^ 

M G A A SXTLTVQ.ARXLLSG I> 



4340 4350 4360 4370 p Q | 451-480 (64) 4400 

GTCCAGCAACAGARCAATCTGCTOGMGGAGAATAGGGAAA 

CAGGTCGT1 G TC' TY GT TA GACGAQCXCC TCTT A T C C C TfT A GGAGTYTCTCCGACACGTACCGCAGATGATGCTAGGGAG 
VQQQXNLlJxENR BILXBPVHGVYYDPS> 

4410 4420 443 0 444 0 4 4 50 V DU 61-81 (136) 4480 

* *■ * * • 

* GAGGAACTGTCCRCCWTGGTGGATATGG 
•STCKTCTTGACAGGYGGWACCACCTATACCCTTTGATGC^ 
XEEL5XXVDHGNYDL> 




4520 



4530 



G V 



D N » 
4570 



A A 



vpr 61-90 (116) 



4560 



GAGTGGACAATAACCTX GCCGC7 ATTAGAAYCCTGOVACAGCTOITGTTCRTTCACTTTAGGATTG 
CTCACCTGTTATTGGAK CGGCGJ rAATCTTRGGACGTTGTCGAGXACAAGYAAGTGAAATCCTAACC 



4S80 



IRXLQQLXFXHFRIGC X H S> 
4590 4600 4610 gag 406-435 (28) * 6 * 0 

AGGATTGGCATCMYCCGTCAGAGAAGGGSCAGJjGC^ 

TC CTAACCGTAGKRGGCAGTCTC TTCC CSGTC'ICG AGGGTC C7 TTl'IlZCCTACGACCTTCACACCGTYTCTXrCCTGTGGT 
RIGIXRQRRX R ! A PRKK GCWKCGXB GHQ> 



4650 



4660 



4670 



4 680 



4690 



4700 



4710 



4720 



GATGAAGGATTGCACTGAGAGACAGGCTAACTTTCTCGGA 
CTACTTCCTWyGTGACTCTCTGTC 

UDr s»trlv* D r vixTYWGL> 



D C T 



vif 61-90(104) 



ERQANPLGK 
4750 4760 



X A R L X 
4770 4780 



4790 



4 800 



ATA CC GGTG AGAGAGACTGGCAS CTCGGC CA WGGCGTCAGC ATTG AGTGGAGC AYAAGGGAAAGGGCTGAGGATAGCGGC 
TATGGCCAC TXITC TCTCACCGTSGAGCCGGTWCCGC AGTC GTAACTCACCTCC TRTTCCCTTTCCCGACTCCTATCGCCG 
HTGBR DWX LGXGVS I E W R X 



E R A E D S G> 
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vpu 46-75 (135) 4830 4840 4850 4860 4870 



4880 



4890 env 510-539 (170) 4920 4930 4940 4950 



4960 



GGG ACCCGATACGCY GGRG RGAATCGA AG AGGAAGGCGGAGAGC RAGRC AQACRCAC AAGCCTCAGGCTCGTG ARTGGG A 
CCCTGGGCTATCCGRCCYCYCTTAGCTTCTCCTTCCGCXrrCT^ 

G PDRXXX1EEECCEXXRXRSVRL V X 



4970 4980 net 151-180 (1 89) 5010 5020 5030 

* * * ' # * * 



5040 



GWGAGGTCGACGAARYCAATRAGGGAGAGAATAACTGfltTTGCT 
CWCTCCACCTCCTTYRGTTAYTCCCTCW^ 

X EVEEXNXGEN.N CI>L>H PXXX H GHE D E X> 

5050 5060 5070 DO! 961-990 f 98) 5100 5110 5120 

AGAGAGGTdAATAGCGATATCAAAGTGGTCCCCAGAAGGAAAGC^ 
R B V* NSDIKVVPRRKA KIIRDYGKQMA G> 
5130 5140 5150 5160 p Q | 16-45 (35) 5190 5200 

CGMT*2ACTGTGTGGCCRGc|tTCY CTTCCGA 

GXTK ACTGACACACCGG YCuAAGRGAAGGCTCGTTTGT 3f CCC GATTGAGGRGA YGTTCGTCTTTCGAgXCTCTGCCTCCGC 
X DCVAX'PX S EQTXAN SXX5RK !• G DGG> 

5210 5220 5230 5240 S2S0 ga g 390-420 (27) " 80 

CAGCCGASACACAC CC AA C AACCTCCA G dTGTTTC AATTGCGGCAAAGACCGAC ACWTTGCCARAAACTt?rAGGCCCCCT 
CTCGGCTSTCTGTCC C TT G TTC G AGGTCTIACAAAGTTAACGCCGTT^ C 'JCTG K AACC GT I f TTTG ACATCCCGGGGA 
GA X R Q G T S S S'C F NCG K EGHXAXNC R A P> 

5290 5300 5310 5320 5330 5340 5350 5360 

* *• *. * *• * * * 

CGCAAGAAAGgrTGTTGCAAATGCGGAARGGAAQGCCA 
GC C TT C TTT C aUtCAA ll ' JlUA CGCCrTyC V l^^ 
R K XGCWKCC XEC HON X DCTE R Q A N F t> G> 

gag 421-450 (29) 5390 5400 5410 5420 5430 5440 

CAAAATCTGGCCCTCCMRCAAAGGCA<^CCCGGAAA L 11 fC YCCAAAGQAAKI^GCTCTGGTATATCAAAATCTrrATCA 
GTTTTAGACC6GGAGGX Y GI TTCCGTC TGGGC CT T IX^AAACRGGTTTCQTTOACCGAGACCAra^ 

KIWPSXKGR PGNFXO S'XWLWYXKXFI> 

5450 env 465-494 (167) S48 ° 5490 5500 S510 5S20 



TGATCtyrCGGTGGACTGRTTGGCCTCAGGATTR'JXJ 
ACTAGCA<XTCACXTGACYAACCGGAGTCC7AAYAGAAACGG^ 

M I VGGLXGLR IX FAVLSIXn'gAXSXDI^ 



5530 5540 net 31-60 (181) 5578 S588 spacers 



GATAAACATGGCGCTWTTACAAGCTCrAATACCSCT^ GCTGCC ATGAC 

CTATTTGTACCGCGAKJUlTGTTCGAGGTT ATGGSGACGGTTATTG SGACTGACACRC ACC GAC YTCCGJ CGACGC TACTG 
OKHGAXTSSNTX ANNXDCXW L X A A AMT> 



5610 5620 5630 VpU 1-30 (132) 566 °* 56? ° 5680 

ACCCCTGGAGATCAffCGCTATCGTCGCCYTTATCGTCGCCCTCATCMTAGTC 

TGGCG ACCTCTAGTAGCGATAGCAGCCG RAATAGCAGCGGG AGTAGKATCGGTAACACCAG ACCTGTTAGCRGAWGTAA C 
PLEIIA1VA XIVALI XAIVVWTIXXI> 



t 



A ACGA AAGCGAAGGCGACASAGAAGAGCTCAGCRCAWTGGTGG AC ATGGGCAftTTAC^ CCTGCCCCCAG /\6 

TT GC T* r rCGCTrCCGCTGTS M JXrJTCTCGAGTCGY 
NESEGDXEELSXXVDMGKYDLSS PAP R> 



join 
A7 



5690 



t 




5700 5710 5720 pol 136-165 (43) 5750 5760 ^ 

* * * r * * # * A7 

1KTGCTCACCCA AHTC^GGA Y GC AC ACTGAATTTC CXTTATCTCCCCCATTGAS AC AGTGCCTGTGAAA j Q j n 

•AXACGAGTGGGTTXAGCCTRCGTCTGACTTAAAGCX^TAGAGGC R1 

LTQXGXTLNFPISPIXTVPVK> 01 
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5770 



spacers saoo saio env 255-284 (153) 584 ? 



L K P G M D G 



CTGAAACCCGGAATGGATGGt GCCGCC AY CTTTAGGCCTGGCGGAGGCIIAT ATSA R AGACAATTGGAGAAGCGAACTGTA • 
T ACCTACCQCGGCGCfrR GA AATC CGGACCGCCTCCGyTA TA STYTCTrGTTAACCTCTTCGC TTG AC AT 
XPRPGGGXXXDNWRSEI»Y> 



5B50 S860 5B70 5880 5890 5900 5910 5920 

* * * - # * * * * 

TAAGTATAAGGTCGTGRAGATTRAGCCTCTGGGAR 

ATTC ATATTCCAGCACYTCTAA YTCGGAGACCCTY AOTGTACCTAAGGGCTTA CCCTCAAGCA GTTGTGTGGGGGTG ACC 
K Y K V V X I XPLG X 1 T WIPEWEPVNTPPI*> 



pol 556-585 (71) 



5950 5960 5970 5980 5990 6000 



TCAAGCTATGGTATCAGCTGGAGAAAGASCCTATCGYTGG^ 
AGTrcGATACCATAGTCGACCTCTTTCTSGGATAGCRACCGCRACTOGG 

VKLWYQLBXXPI XGX b'pQDL.HXMI* MX V> 



6oio gag 181-210 (13) 8040 



6050 6060 6070 6080 



GGAGGCCATCAGGCCGCTATGCAAA TGCTGAAAGA S ACAATCAATGAGGAAGCCGCTCTCCTGTTTCTGGATG 
CCTCCGGTAGTCCGGCGATAGGTTPACGACTTTCTSTGTTAG^ 
GGHQAA MQ M I# K X TI I* BBAA VLF L DGI X> 



6090 6100 pol 706-735 (81) "3© 6140 6150 



6160 



C AAAGCTCAAGAGGAACAtK^GARGTATC ACTCCAACTGGAGGAC AATGGCCARCCA>ClTTAATCTd)frGAAGCATMTCG 
GTTTCGAGTTCTCCTTGTACTCTYCATAGTGAG^^ 

K A 0 B B H E XYHS NWRTMAXX PNL'x KHX> 

6170 6180 6190 gag 31-50 (3) 6220 6230 6240 

TCTGGGCCTCTAGGOAGCTGGAGAGATTCGC TC T G AATCCCPjGCCT GC TGG A GACAXCCGAAGGCTGTMAGCAAA' 
AGACCCGGAGATCCCTCGACCTCTCTAACCGACAC^ 

VWASREI»BRFALHPXLLBTXEGCXC; 

6250 6260 6270 62S0 CHV 215-244 (151 > 6310 6320 

GAGGAACAGATTATCATTAGCTCCGAGAATTTCACAR 

CTC tr rfCTCT A ATACTAATCCAGGCTCTTARAG' fC TYTCTTACR WI i i 1GGT AATAGCAGGTKGAGTTGY TTJ VGCAGCW 
EEEIIIRSEHXTXWXKTIIVXI»NXSVX> 

6330 6340 6350 6360 6370 g a g -|_30 (f) "00 

gattaacIatgggcgctagggcyagtgtcc^ 

CTAATTOTACCCGCGATCCCGATCACAGGAGTC^ 

xn'mgarasvlxggxl dawekirlrpo 

6410 6420 6430 6440 6450 Hef 91-120 (185) 6480 

* * * * * * 

GAAAGAAAAAGTATAGyrCAAGGAGAAGGGAGGCCTGGASGGA C ' Jl>K J \ 1 ACTCCMAAAAGAGGCAAGAS ATTCTGGAT 
ClITCTriTlt. ATATCtM^GTTCCTCTTCCCTCCGGACCT 

G X K X Y R L KBKGG I»XGI»XYSXKRQXIL D> * 

6490 6500 6510 6520 6530 6540 6550 6560 T 

CTGTGGGTGTATRACACACACX^TTtJj^^ . - 

CACACCCACATAKTGTGTGTCCC TAA<J^^ffi^ACCCCTTGGWACTAGGAGCCGWACCACTAHTAGA C ATCGC GGTCGCT ) 0m 
LWVYXTQGFTRW GTXI LGXVXICSASX> B2 



env 16-45 (138) 



6590 6600 6610 6620 6630 6640 



SAATCTGTGGGTGACAGTGTATTACGGAGTGCC 
STTAGACACCCACTGTCACATAATGCCTCACGGACACAC 

nlwvtvyygvpvwr'r xllsgi vqoox> 



"so 330.359 (158) 668 J 



6690 6700 6710 6720 



ACCTCCTGAGGGCTATCGAAGCCCAACAGCATCT GTCAGGCATTTCCtXAGCCCTTGGCTC 
TGGAGGACTCCCGATAGCTTCGGGTTCTCGTAGACGAG CAGTCCGTAAAGGGGTCCGGAACCGAG 
HLLRAIBA0OHLl>0LTVWVRHPPRPWL> 
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vpr 31-60 (114) 



6750 6760 6770 6780 6790 6800 



CACRRCCTCGGACAGYACATCTATGAGACATACGGAGAC 

GTGYY GGA CCCTGTCHTCTAGATACTCTGTATCCCTCTG7CTACCM K C CCTC ACCTTCGGGAOT XTCGGG AGTAGT K TGG 

hxlgqxiyetygdtwxgveal'xalix p> 

6810 Vlf 151*180 (110) 6840 6850 6860 6870 6680. 

CAAAAACATTARGCCTCCCCTCCTIATCCGT^ 

GTV1TTCTAATY CGGAGGGGAGGGTAGGCACTTTTTC G AGTGGCTTCTGTYTACCTTA YTXrGGAGTrril^TKljATATCGC 
KKIXPPLPSVKXLTEDXWNXPQXX , YS> 

6890 6900 „■ Q on ttXA\ ^930 6940 6950 6960 



pol 901-930(94) 



CTGGCGAAAGGATTRTCGATATCATTGCAWCCGACATTCAGACTAAG 
GACCGCTTTCCTAAYACCTATAGTAACGTWGGCTCTAAGTCTG^ 

AG BRIXDI IA X D I QTK ELQXQIXK IQ N> 



6970 6980 6990 po| 886-915 (93) 70a J 703 J 



7040 



TTc|GCTCTGTTTATCCATAACrrTAAGAGG^ 

AAOCGACAC AAATAGGTATTGAAATTCTCC TTCCCTCCGT AACCGCCGATGAGGCGGCCTCTCTCTTAGY AACTGTAAT A 
F* AVF IHNPKRKGG IGGYSAGERIX DJ I> 

7050 7060 7070 7080 ggg 256-285 (18) 711 J 7JL2 J 

CGCCASCGATATcJrTTCCCGTGGGCGAWATCTATAAGAGATCGATC 

GCGCTSGCTATAt^AAGGGC ACCCGCTWAGATATTCTCTACCT R 
AXDl'XPVCXIYXRWIILGLNKIVRMY> 



7130 7140 7150. 7160 7170 env 495-524 (169) 



7200 



MACCCtnCAGCATTCTGGATATdAGAGTGAGACAGGGATAC 
KTGGGCAGTCCTAAGACCTATAidrCTCACTCr^^ 

x p vsildi'rv RQGY SP LSPQ TLXPA P R> 

7210 7220 7230 7240 7250 7260 7270 7260 

* * * * * * • * 

GGCCCTGACAGACYCGRASGCATTGAGGAAGACm 

CCGGGACTGTCTGRGCYTSCGTAACTCC TTCTcjftGGTC SGTCCTGGTAGTCATAGGCTrAARGGCTTGTC GG AGACRGAGT 
GPDRXXX I EE E S X Q D H Q YPI X E Q P LX Q> 

tat 61-90 (122) 731 J 732 ? 733 ? 734 ? 73S ? 736 ? 



t 

GHCIUUS^CAimXCM^CCCT^^ B2 
CKGTTCCCCT l 1 GTT A GGGTGTCYGGGA » '1 " L L T JT CGTTTTTc P^^^^^ CCTCACCAGCTC AGGTACTTATTCCTTGACT jOlll 
XRGXNPTXPX£SXXASGVVESMNXEL> B3 

7370 pol 856-885 (91) 7400 7410 7420 7430 7440 ^ 

"kCCGCTGTGCAAATGpCTGCCAT — — 

xxxtgacacgtttacpgacggta 
tavom'aam 



AAAAGATTATCGGACAGGTCAGGGAICACGCTCAGCACCTO 

TTTTCT AATAGCCTGTCCAGTLtX!TXGTCCGACTCGTGGAl.U'ri*ltjtXr GACACGTTTACpGACGGTACGTCTACGAGTTC 
KKIIGQVRXQAEHLKTAVQ m'a A M Q M L K> 



7450 7460 gag 196-225 (14) 74 '° 7500 7510 7520 

GAWACCATTAACGAAGAGGCTGCCGAGTGCGACAGARTCCATCCCGTCC^ 

CTWTGGTAATTGCTTCTCCGACGGCTCACCCTGTCTY AGGTAGGGCAGGTACGGCCTGGGY AASGGGGAEAGTGGCICTA 
X T INEEAA E W D R X H PV H AGP XX P'LT X I> 

7530 7540 7550 pol 1 81-210 (46) 7 580 7590 7600 

TTGTAMAGAAATGGAAVAAGAAGGCAAAATCTCCARGATTCG^ 

AACAT Rl t l '* l T T AC l. ' f rBTfL TTCCG1 1 1 ' T A GAGGTYCTAACCGGGACTCTrAGGGATATTGTGTGGGY AGAAACGGT A/IG 
CXEMEXEGXISXIGPENPYNTPXFAI^ 

7610 7620 7630 7640 po I 871-900 (92) 767 J 7680 



AAGTGAGAGASCAAGCCGAACACCTCAAGACAGCCGTCCAGATGGCAGTCTTCAT^ 

TTCACTCTCTSGTTCGGerrCTGGAGrJtl'J^'J^ Ttt l YTCCCCCT 

qvrxqaerlk tavqhavpihnpxrxg g> 
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7690 



7700 



7710 



7740 



7750 



7760 



pol 211-240 (48) 

atcggaggJaaaaagaaagatagcacaaagtggaggaaactgg™ 
TAGCCTceJrrn'ixrn'rcj'ATCGTGTTTCACCi x: 

I GG*KKKDSTKHRKLVDFRELWKRTQDF> 



7770 



77B0 



7790 



7800 



env 540-569 (172) 



7830 



7840 



CTGGGAGGTCCAGCTCGGOTrTTYG^ 

gaccctccaggtcgagcccJaaaaiw:cgagaccgaaccctactg^ 

WBVOLG'FXA LAMDDLRS'LCLFSYHRL> 



7850 



7860 



7870 



7880 



7890 



7920 



vpr 76-96(117) 

gagacyttatcctcatcgytgccagaaycpcco^ 
ctctgraataggagtagcracggtcttrcIacggytct 

R D X I L I X A R X ' C X HSRIGITRQRRXRHO 



spacers 



7950 



7960 



7970 



env 155-184 (147) 80 °? 



KCCTCCACGTCtjGCTGCqCCCAAAR^ 
MGGAGGTCCi 
X S R S 

8010 



.._ _ . . . _ . . _ _ _ _ TATCCT 

fAGWGGAAGCTKGGGTAAGGGTAAGTGATAACGCGAGGGCGACCGA 
PRXXPXPIPIHYCAPACXAIL> 



8020 



8030 



8040 



8050 



8080 



vif 76-105(105) 

CAAGTGTAACRATAAGAM*rrcAATGGc|GAAA^ 

CTTCACATTC YTATTCTKKAAGTTAlX CJLTTT yCCTAACC GTWG ACC CTGTSCCTC ACAGGTAGCTTACCTCTKWCTTTT 
KCWXKXPNG'EXDWXLGXGVS I BWRX R> 



8090 



8100 



8110 



8120 



8130 



GSTATAGCACACAGGTGGACCCTGRCCTCGCCGATCAC 
CSATATCGTGTGTCCACCTGGGACYGGAGCCGCTAGTl^ 
XYSTOVDPXLADQP 



gag 481-499 (33) ai6 ? 

CTCTATCCTCCCTYAGCTTCCCTGAAAAGCCTCTTC 
CAGATAGGAGGGARTCGAAGGGACTTTTCGGAGAAG 
LYPPXASLKSI»F> 



8170 



spacers 



8200 



8210 



G N D P X 
8250 



S 0 
8260 



vif 121-150 (108) 



8240 



GGAAACGATCCCTYATCCCAJ GCCGC1 AGAAGGGCTATCCTCGGCCAWAXAGTCAGSAGAAGGTGTCAGTATCMGXCCGG 
CCTTTGCTAGGGARTAGGGT1 CGGCGA rerTTCCraATAGGAGCCGGTWrarCAGTC SaFTLTTC! CAC ACTCATAGKCMGGCC 
AARRAILGXXVXRRCEYXXO 



8270 



8280 



8290 



6300 



8310 



8320 



ACACAATAAGGTCGGCTCCCTCCAATACCTCGCACTtk 
TGTGTTATTCCAGCCGAGGGACGTTATGGAGCGTGA^ 

H M KVGS L0Y LAl'sQPXTACX KCYCKK> 



tat 16-45 (119) 



83 50 



8360 



8370 



8400 



pol 976-995 (99) 

GTTGCTTCOUrrGTCAGSTCTGCTTCCTGA 
CAACGAWGGTGACAGTCSAGACGAAGGACTKCTtCCC^^ 

CCXHCQXCPLXKGLG i'rDYGKQMAGX^ 



8430 



^pacers B44( > 8450 ^,721.750(82) 8ae ? 

TGTGTGGCCRGCAGGCAAGACGAAGAC GCAGCC AAGTACCATAGC AATTGGAGAACCATGGCCARTGASTTT AACCTCCC 
ACACACCX^CGTCCG7TCTGCTrCT0^GTCG<^ 

KY HSNWRTMAXXFHLP> 



A A 



CVAXRQDB D 

8490 8500 8510 



8520 



8530 



8540 



8550 



8560 



CCCTATCGTCSCTAAGGAAATCGTCGCAW RTTGCGA 
GGGATAGCA6SGA T IC CT T T AGCAGCGTWYAACGCTA' 
PIVXXBIV AX 



kTAAGTGTjAACGAA 

•attcacJttgctt 



ltggrcactggaactgctggaggaactgaaam 

PGCTTACCYGTGACCTTGACGACCTCCTTGACTTTK 
WXL E L L E E L X> 



vpr 16-45 (113) 



8590 



8600 



8610 



8620 



8630 



8640 



AWGAAGCCGTGAGACACTTTCCCAGACCCTGGCTGCA 
TWCTTCGGCACTC7GTGAAA<'»rrC^^ 

XEAVRK FPRPWLHGLGQh'dX I S L W D 0 S> 
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8650 eny -106-144 (144) 8680 8690 8700 8710 8720 

* * * 

CTGAAACCCTCTGTGAAACTGACACCCCT 

GACTTTGGGACACACTTTGACTGTGGCGAGACGCAGTGGGAGTTGACATGG 
LKPCVKLTPLCVTLNCTN AN L T X K X Y S T> 

8730 8740 yff 91-120 (106) 8770 8780 8790 8800 

arrTCACCTGGCGCYAGACCGACTGGTWGACT 

QVDPXLADXLIHLHYFDCFXDSXll H P> 

8810 8820 8830 nef 1 66-1 95 (190) 88 «0 9B70 6880 

* 1 ' * * » 

TSRGCCWACACGGAATGGAGGATGAGGAWAGGGAAGTGCT^ 
ASYCGGWTGTGCCTTACCTCCTACTCCTWTXXCTTCACGACT^ 

X X X H G M EDEXREVLX WKFDSXLAXRH X> 
8690 8900 8910 8920 p C | 151-180 (44) 8950 8960 

gct||||||cctatcgawacc 

ASSPIXTVPVKLK PGMDGPK ^ J * A ^^ CG ^ GG ^ GT ^ > 
8970 8980 8990 9000 9010 g 3 g 436-465 (30) 9040 

CGAAGAX5AAAATCAAAGCOATTTGGCCTAGCMRCAAGCGAAGGCCTGGCAA 
GCTTCTCTTTTAGTTTC<^AAACCGGATCG^ 

EBKIKA'lWPSXKGRP6NFXQSXPEPT> 

9050 9060 9070 9080 9090 Vlf 31-60 (102) 9120 

CACCCCC AGCCGAGAR CTTTRGATTCGGuATTAGCAAAAAGCCTAA SGGA TGCTTTT AC AGACA CC ATTVCG A WAGC CRA 
GTGGGGGTCJ latA-Tt-T Y G AAA YCTAAGCCQTAA'I 1. 1» 1*1 1'l'I^LtATTSCCTACC AAAATtyTCTGTGGTAAI^CTWTCG GYT 
AP PA EX FX F G I SKKAX G W P Y R H H X X S X> 

9130 9140 9150 9160 9170 9180 9190 9200 

* * * * • * • * 

CACCCTAACGTCAGCTCCCAGGTCCACATTCCCCTCGG<U 
CTCCGATTCGACTCXSAGGCTCCAGGTGTAACG 
HPXVSSEVHIPL C'm MTACQGVGGPXH K> 

gag 346-375 (24) 9230 9240 9250 9260 9270 9200 

* * • * * * 

AGCCAGGGTACTGGCAGAGGCTATGTCCCAGGYGA 
TCGGTCCCATGACCGTCTCCGATACAGGG^X:KClTiGXTGCt^ 

ARVLAEAMSQXXXAH I 1 P PIVXKEIVA> 

9290 pol 736-765 (83) ^320 9330 9340 9350 9360 

* * * * • * 

RCTGTGACAAATGCCAGCTCAAGGGTGAGGCTATKCACGG 
YCACACTGTTTACGGTCGAGTTCCCACTCCGATAWGTGC^ 

XCDXCQLXGEAXHGQVXCSP I SBGXRQX> 

9370 9380 rev 31-60 (126) » 4 * 0 9420 9430 9440 

• * * ' * • * # 

AGGA RGAACAGACGTAGAAGGTCGCGTGHGAGGCAAAGGCAAATCCRCKCCATCTXTCG^ 
TCCTYCTTGTCTGCATCTTCCACCGCACKC^ 
RXNRRRRW RXRQRQIXXIS E X I L. G Q X R> 



9450 9460 9470 gag 226-255 (16) 950 ° 951 ° 952 ° 

GGAACCCAGAGGCTCCGACATTGCCGGTACCACAAGCACAC^ 
LVrraGTCTCTCAGGCTGTAACT;GCC^^ 

EPRCSDIACTTSTLQEQIXWWTXNPpJ 

9530 9540 9550 9560 pol 841-870 (90) 959 ? 96 °? 

RCATTMAGCAAGAGTTTCGCATTCCCTATAACCCTCAGTCCCAG 
YCTAAK1^C,TfC'fCAAACCGTAACCGATATTGGGAGTCACCGTCCCCXAGCA^ 

XIXQE FCI PYM PQSQG VVESMNKELK X> 
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9630 9620 9630 ne f 106-135 (186) 9*60 967 ° 9*80 

ATCATTGGOAGACAGGAGATCCTCGATCTCTGGGTCTACMATA 
TAGTAACCOrcrcTCCTCTAGGAGCTACAGACCCAGATGKTATGG 

i i g'rqei ldlwvyxtqgxfpdwxnytp> 
9690 9700 9710 9720 rev 46-75 (127) 9750 9750 

(XX3ACCCGGARYCAGATA(jM||||^ 

G<xrrGGGCCTYitGTCTATqflg^ 

GPGXRYPSRXRQRQIXXISEXILSXX> 

9800 9810 gag 301-330 (21) 

ewtaagacactgagagccgaacaggctwcccaagasgtcaagaat 

kt^TTCTGTGACTCTCGGCTTGTCCGAWGGGTTC 
XX.TLRAEQAXQXVK N> 

9850 9860 9870 9880 9890 9900 9910 9920 

• » •* * ^ * * * * 

tcgatgaccgasacactgctcgtgcaaaacgctaaccct 

ACCTACTGQCTSTGTGACGAGCA CLrXrriXrC CATTGGGACT^ 
HMTXTLLVQNAN PDc'eXVYLXWVPAHK> 



9770 



9780 



9790 




TCGGCAGAYCCGCTGAGCCTGTGCCTCTGCAAC 
ACCCGTCTRGGCGAC 

LGRXAEPVPLQL 1 



pol 676-705 (79) 



9950 



9960 



9970 



9980 



9990 



10000 



AGGCATTGGCGGAAACGAACAGGTGGACAAACTGGTCAKCKCTGGCATTAGGA 
TCCGT AACCGC C I T TI jC '1TUTCC acctgtttgacc a gtmgmgac cgt aat 
G I 



CAGACCCTAACCCTCAGGAARTCS 
TTGGGAGTCCTTYAGS 




iooio env 76-105 (142) 1004 ? 



tmrTGGAAAACGT^CCGAGAACTTTAACATGTGGAAAAACRA 
WAGAC lTre T CCAG T t^ TLTI t3 UUtTTCT 

xle»vtewfnhwknxmvxqmxe'ag x a i> 
10090 ioioo env 17D-199 (148) 10130 10140 10150 



10160 



CTGAAATGCAATRACAAAAIISTTCAA<^yUiCTGGA 

GACTTTACGTTA i fCl 1 1 ' lR SAACTTGCCTTGACCTCCGACATKCTTACACAGCTSGCAGCTC 
LXCNXKXPliCTGPCXWVSXVQCTH g'x E> 

10170 10180 10190 env 600-629 (176) i° 22 ° 10230 10240 

• fr » * ' ♦ * * 

GCTCAAGAWrAGCGCTRTCTCCCTGCTCAACGCTACCGCTA 

LKXSAXSLLNATAIAVAXXTDRXIBV> 



10250 



10260 



1027O 



10280 



vif 46-75(103) 



10310 



10320 



YTCA (jTCtCRGCATCCCAAACTGTCC AGCGAAGTCCATATCC^^ 
RAGT^GGGYCGTAGGGrrrCACAGGTCGCTTCACCTATAG 

XoJsXHPXVSS EVHIPLGXARLXIXTYW> 



10330 spacers 



10360 



10370 



G L 



T G 



nef 1-30 (179) 



10400 



GGCCTCCASACAGGf GCTGC1 ATCGCCGGTAAATGCTCCAACWGCTCCCYCGTCGGATGGC 
CCGGAGCTSTGTCCoJcGACC^ACCCCCCATTTACCACG^ 

MGGKWSKXSXVGHPXVRERI> 



10410 10420 10430 10440 10450 * pol 496-525 (67) 10480 



CAGACRGRCASCCCCTCCCCCTCAGGCAGTclcTCAAG^ 

gtctgycygtscgcgacggcgactccctcac^ A 

RXXXPAAEC v'l KTGKYXRXRXAHTND> # 



R X X X 
10490 



RXRXAHTND> 

10500 10510 10520 10530 10540 105S0 10560 36 

^^^tcgcaggstctcaaatactxgrggaatctgctc J^'i 1 

accctccsagactttatgamcmccttagacgag tJ7 

ATESSWEXLKYXXHL L> 



TCARGCAACTGACAGMGGyPGTGCAAAAGATPGCCACAGAC 

agtycgttgactgtckccracacgttttctaacggtgtctc 

V-XQLTXXVQXI 
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env 585-614 (175) i<>590 i°6° 0 10610 10620 10630 10640 

C^ATTACTGGGGCCWGGAACTGAAAAWCTCCGC^ 
GWCATGACCCCGGWCCTTGACTTTTVK^AGGCGGYAGTCGGAGGA 
X * W G X EL X X S A X S.L L N A T A ' I X L P EK X S> 

1065? pc! 391-420 (60) 10680 10690 10700 10710 10720 . 

CTGGACCGTCAACGATATCCAAAACXrrCGTGGGAAACCTC 
GACCTGGCAGTTGCTATAGGTTTTCGAGCACtXrrTTCGAGTT^ 

W T V N D I OX I* V G KLNWASQ I Y X G * R A I E> 

10730 io?40 env 345-374 (159) 10770 10780 10790 . i° 8 <> 0 

(TrCAGCAACACVIt^rTGCAACTGACAGTGTGGGGCAT^ 
GAGTCGTTGl«<ACGACGTTGACTGTCACACCCXrG^ 

AQQHXLQ LTVWGIKQJLQARVLAX E R Y 1 L> 

10810 10B20 10830 pol 631-660 (76) 10860 10870 10880 

GCCCTCC AGGATAjGCGGATYGGAAGTGAATATCGTCACCG^ CTGA 
CGGGAGGTCCTATCGCCTARCCTTCACTTATAGCAGTGGCTATC 
AI*QDSGXEVNIVTDSQYALGIIXAQP1» 



10B90 10900 10910 10920 en v 420-449 (1 64) 10950 

* * * * * 



10960 



GTYTT 



TTTCCCTTTAGAGGTTGATATGGTYAGWCTAAATGYTCT 
X S^ER EI SHY T XX I Y X I LTBSQN OQD R> 

10970 10980 10990 nooo noio en v 285-314 (155) n°40 
***** * 



ASCTCCTuGCTCCCACAAR 
TSGAGGAOCGAGGGTGTTY 

X I* 1» A ptx 



TCCCACAARGGCTAAX3AGAAGGGTCGTGSAAAGGGAAAAGCC 
'CCGATTCTCTTCCCAGC 

AKRRVVX REKRA 



11050 11060 11070 11080 11090 DOt 91 -1 20 (40) 11120 

* * * * * r x 9 * 
ATt;t ffT YTCGG A TTCCTC GC CGCTGCQAAACCCAAAATX^ 
TACWAARAGCCTAACCAGCCCX^CCtprrc^ 

nXXCFLCAA* K PKMIGGIGGFIXVRQY D> 

11130 11140 11150 11160 11170 11180 11190 11200 

* • * * • * - * * 

CCAAATOfTTATCGAAATCTCTGGAJiASAAGGCTAT^^ 

GGTTTAGKAATAGCTTTAGACACCTHTSTTCCGATA^ RGCGATCCT 
QIXIEICGXKAI»SYHRLRDPI LIXAR> 

env 555-584 (173) 11230 11240 11250 11260 11270 11280 
"* * * * * * * 

YTCTGGAACTGCTCGGCCRTAGCTC 

RAC ACCTTGACCAGCCGGY ATCCACGGA CTYTCCGG AGCYCTCT 

XVELLGXSSLXGLXR G ' T LNAWVKVXE E> 



+ 

B7 



11290 gag 151-180 (11) 11320 . 11330 n34 °. 11350 . 11360 

AAGGSATK^ARTCCCGAAGTGATTCCCATGTTTWC^ 
TTCCSTAAGTYAGGCCTTCACTAAGGGTACAAAWG^ J° ,n 
KXFXPEVIPMPXALSEGATLESNTXAN>C1 



11370 H380 . nef 46-75 (182) 11410 11420 11430 11440 . 

CAATSCeGATTGCGYGTCGCTGRAAGCCCAGGA^ K 
GTT ASGGCT AACGCRC ACC G^"YTTCGGGTCCTTCTCC TTCYTCACTCT AAAGGACACTCTGGGGTTCAC W*^'CTC GGK 
MXDCXWLXAOBEEXVGFPVRPQV P'R A> 



11450 env 630-651 (178) 11480 11490 spacers 



11520 



GGAGGGCT ATCCTtTMACATTCCC ASGAGGATT GAATGGGATACGRTT 
CCTCCXt^TACGACKTGTAAGGGTSCTC CT AATCCGTTCCGRAACTCTCTCGXSGAGGAy GGCGC CTTACCCTATCCYAA 
XRAILXI PXR X RQGXERAL L | A A E W D R X> 
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11580 H590 11600 




lU c fl 11670 11680 

n«io U620 U630 net 76-105 (184) 1166 ° 

C GGGT ACTGT AT ATTCC SGC GAY AACTGGAGTCG RAC AAA G ACTTTCTtTT^lT- ^ gLXGI>xyS > 




H770 11780 H790 "800 11810 p 0 l 481-510 (66) 

T^Tx q«txqixqepfkhi.ktg 



CT-TCTCGAGTTCKTKCTCCG^GTTCYGGrrACCTWA „ ^ ^ « . » m f. K T G K> 

E E 2» K X 

11900 11910 11920 




gag 316-345 (22) "«» xi~ "™ »«. ^ 



CCA1 
GGTA 




x I LXAi*to^«.--- 

i 2 oio gag 166-195(12) »oso 120*0 »m u... 

C<»«*W«ttTCTGGMrrTM^^ T ^ 0 E 0 I X W M> . 

„.» m» sa9 2«-270(17) »». »'« m °? ^ 

12220 12230 12240 1 

. * l * I 




12 250 12260 12210 12280 p 0 | 541-570 (70) 12 "? 



CCTC 
GGA 




12410 12*20 12430 12440 12450 pol 571-600 (72) 



12480 
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12490 12500 12510 12520 Q3Q 136-165 (10) 12550 12560 

| * 

GCCGCCARCAGAGAGACAAAGCTCGGQCAAAACSYCCACGGACAGATGGTGCATCAGSCTMTTAG 

CG<^GGTYGTCTCl>CTGTTTCGAGCCdGTTTTt»S RGGTCCCTGTCTACCACGTAGTCSGA KAATCGGGGTCCTGGG A GlT 
A A X R E T K LG'ONXOGQM VHQXXSP RT L N> 

12570 12580 12590 12600 12610 ei!V 61-90 (141) 12640 

CGCTTGGGTCAAGGTCRTCGAAGAGAAAGSCTTTAROGA 
GCGAACCCAGTTCCAGYACCTTCTCTTTCSGAAATYCCT 

AWVKVXEEXXFX , XTEVHNVWATHACV> 

12650 12660 12670 12680 12690 12700 12710 12730 

* * * * ♦ * * * 

CTACCGATCCCAATCCCCAAGAGRTTS1*:CTGGAG^ 
GATGGCTA<XSGTTAGGGGTTCTCYAASWG^ 

PTDPN PQEXXLEWVTE'LKDQXXLGXWO 



env 375-404 (161) 12750 12760 12770 12730 12790 

* * * mf * •*■ * 



12B00 



TGCTCCGGCAAAJfTCATTTGCACAACCRJPfI13TGCCTTGG 

ACGAGGCCGTTTKAGT AAACGTGTTGGY KAeACGGAACCTTGTCGWGGACCACGTTGGXTIIGACCGGTATTGTTTCACCC 

csgkx icttxvpwnsxwsn'xxchnkvo 

12810 yif 136-165 (109) 12840 12850 12860 12870 12880 

AAGCCTCCAGTATCTGGCTCTGAHGGCTCTGATTAMGCCTAAG 
TTCGGAGGTCATAGACCGAGACTKCCGAGACTAATKCGGATTCTT^ 

SLQYLALXALIXPKKIXPPLPS'XKTI* 

12890 12900 env 230-254 (152) 12930 spacers 12960 
* • * ' * ^ 1 1 * 



TTGTGCATCTGAATRAGTCCGTGGWAATCA^ GCCGO^^^^SAAGWA C2 



^^acTTcwr Join 
a aJ a s e x> Q3 



AACACGTAGACTTAYTCAGGCACCWTTAGTTAACGTGTT C CGGATYGTTA' 
IVHLNXSVXINCTRPXNNTRX 



t 



12970 12980 12990 gag 1 06-135 (8) 13020 13030 13040 

CAGAAW AAGTCCMAACAGAAAACCC AGC AAGCCGCCGCC GATACAGGC ARCTCCAGCHAGGTCAGCCAAAACTATCC CAT 
GT C TTW T TCAGCMTTCTCTTITT^GCTCGTTCGCCGGCCGCTATG^ 
QXXSXQKTOQAAADTGXSSXVSQMYP J> 

13050 13060 13070 13080 poj 826-855 (89) 13110 13120 

TGTOTCCAACTTTACCTXXRCCRCTGTGAAAGC^ 
ACAOAGGTTGAAATGGAGGYGGYGACACTTTCGGCGAACAArc 

v'SNPTSXXVKAACWWAXIXQEPGI PY> 

13130 13140 13150 13160 13170 po! 586-615 (73) 13200 



J 



ATCCCGAAAGCCA^ACATTCTATGTGCATGCCG 
TAGGGG1 
M P 



Q S Q 1 T PYVDGAAXRETKLGXAGYVT D> 



13210 13220 13230 13240 13250 n 0 | 766*795 (85) 13280 

ACACGCAGACAGAAARTCRTTAG^GCAATCTGCCAC^TCGACTGTA 
TCTCCGTlIVIVPrryACTAATcdct^ 
R G R Q E X X S'G IWQ LDCTH LBGKXI L V A V> 

13290 13300 13310 13320 13330 13340 13350 13360 



GCCTACATTGACCCTGACGTCfcGCAATCACCA^ 
GGTGCAGCGGAGGCCGATGTAACTCCCACITCAOCrCX^ 

HVASGYIEAEV'GN EQVDKLVXXGIR K> 



pol 691-720 (80) 13390 13400 13410 13420 13430 



13440 



ItXTTATTCCTCGACGGAATCRATAAGGCTCAGG 
ACGATAAGGAGCTCCCTTACYTATTCCCAGTCCTTC^ 

V L F L D G I XKA O. E EH E V R ERIRXXXPAA> 
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nef 16-45 (180) i34?o 13490 13490 13500 13510 13520 

gaaggcgtcggcgctgyctcccrggatctggataagkacggagccmtcacctccp 
cttccgcagccgcgacrgagggycctagacctattcmtgc^ 
egvgaxsxdldkxgaxts'tsgtqqsqg> 

13530 reV 91-120(130) 13560 13570 13580 13590 13600 

aactgaaactgccgtcggqch:cctcagatttygg^ 

TTGACOTTCACCGCAGCCGKYCGGAGTCTAAARCCCT^^ 

TBTCVGX P Q IX CE5 S X X L G X G ' S I V I W> 

13610 13620 poj 526-555 (69) 1365 J 13660 s pacers 



GTAAAACCCCTAACTTTARGCTCCCCATTCAGAR SCTGCI 



CATTTTGGGGATTCAAATYCGAGGGGTAAGTCTYTCTCTCT 
GKTPXPXfcPIQXE TWEXWWXXYWQA 



kCGA 
A A> 



t 

C3 



13690 13700 13710 en V 140-169 (146) 13740 13750 13760 

TACAGACK^TCARCTCTAACACAAGCGYTATCAMAC 

ATGTCTGACTAGTYGAC ATTGTGTTCGC RATAGTXTGTCCGAACGGGAT1CYAATSGAAACTSGC ATAGGGATAGGTAAT 
YRLIXCNTSXIXQACPKXXPXPIPIHY> 

13770 13780 13790 13800 DO I 376-405 (59) 13830 13840 

* * * • r - * - ' * * 

C^TGTGCCCC^^^^^^^TGGAlGGGCTATGAGCTCCACCCTXAC AG ATGGAC AGTGCAACCCATC SWGCTCCCCGAAAAGG . . 
GACACGGCGJt§iil§i§ACC?ACCCGA JOIH 
CAPPSWMCYELHPDRWTVQPIXLPEK> £4 

13650 13860 13870 13880 13890 gag 331-360 (23) 13920 1 

I # * * T 

ASTCCTrGGACACTGAATGACATTCAdAAAI#CAATTCTt»RAG 

TSAGGACCTGTCACTTACTGTAA Aj 'I Cj J 1 I Nbl I AAGACTYTCGCCAGCCGXGTCCGCGAWGGGACt-TCC TT T A CTACTGT 
X5WTVNDIQ XX I LXALGX GAXL E E M M T> 

13930 13940 13950 13960 13970 13980 13990 14000 

* * ♦ • * * * * 

GCATGTCAGQGAGTGGGAGGCCCTR GCCATAAGGCt|aGAGTGTATTACAGAGACTCC AGCCACCC CMTTTQCAAACCCCC 
CCTACAGTCXCTCACCCTCCGG G AYCGCTATrCCGJTreTCACATAATCTCTCT^ 

«. •> o d v cj w y YRD S RDP XWKG P> 

14050 14060 14070 14080 

* * * * 

^tccaagaJrttaagattggaggccaactgawagaagccctcc 

GACACCA<^AGGTrCTQY AATTCTAACCTCCGG^TTG A CTtfTCTTCGGGAGG 
G A V V I Q D'X KIGGQLXEA 1*> 

14130 14140 14150 14160 




TGGATACAGGAGCCGATGACACCGTCtrTGGAAGAWATSAATCTG^ 
ACCTAttn^CTCGGCTACTCTGGCACGACCTTCTI^^ 

LDTGADDTV LEXXNLPGX W'C I K Q L Q A R> 



14170 14180 env 360-389 (160) * 4210 14220 spacers 



GTCtTTGGCTRTCGAGAGGTATCTCAAACATC AAMAGYT^ C GCTGCI ATGGAAAA 

CAGGACCGAYAGCTCTCCATAGACTTTCTAGTTKTCRAAGACCC^ CGACG/fTACCTTTT 
VLAXERYLKDQXXLG XWGCSGK A A 



H E N> 



14250 14260 14270 Vlf 1-30 (100) 14300 14310 14320 

CAGATGGCAACTG^frGATCGTCTGGCAAGTGGACAGGATGARGATTAGGA 

GTCTACCGTTCACKACTAGCAGACCGTTC ACCTGTCCTACTY CTAATCCTGTA CCTTVn^GGAGC ACTTTGTGGTAT ACR 
RWQVXIVWQVDRMXIRTWXSLVRHHM> 



14330 14340 14350 14360 en v 390-419 (1 62) 14390 



14400 



ATlHTTATCTGTACCACARWCGTCCCCTGGA^ 

tAcaatacacatgctgtyrgcagcggaccttc 

x'x ICTTXVPWNSXMSNKSXEBI W X N M T> 
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14410 14420 14430 VDU 16-45 (133) 14460 14470 14480 

* , * 1 # * 

TGGATKSAATGOCTC^TTHTCGCTATC 

acctamsttacagactaakagcgatagcagcacaccfggtaacrc^watagcttatgty 
wxxw'lixaivvwtixxieyxkllxqrx> 

14490 14500 14510 14520 gag 46-75 (4) 14550 14560 

AATCGATAGGCTCATCRAAAGQCTCAAC^ 

TTAGCTATCCGAGTAkY l-£-££ UGAGTTGGGACCGG AGGACCTTTGGHGACTCC CTACA KTTGTCT AGGACCYTGTCGAGG 

i drljx.r'x.npci.letxeccxqxlxqx^- 

14570 14560 14590 14600 14610 14620 14630 14640 

AGYCCGCCCTCMAGACAGGCWCCGAAGAGl 1 !^^ 

TCRGGCGGGAGXTCTGTCCGWGGCTTCTCGACg^^ JOIIT 
QXALXTGXEELSSRKLLXQRXIDRLIX> C5 



f 



VpU 31-60 (134) 14670 14680 14690 14700 14710 14720 i 

I * " * * I 

AGAAYCAGAGAGAGAGCCGAAGACTCCGGCAATGAGTCCGAGGGAGAuACA 
TClT R GTCTCTCTCICGCCTlXin^CGCCGTTACTCAGCCTC 
RXRERAEDSGMESEG D*T PGIRYQYNV U> 



14730 pol 286-315 (53) 14760 14770 . 14780 14790 



14800 



CCCCCAAGCCTGGAAGCG^ 

PQGNKGSPXXFQSSMXXI L ' H M Q R G N F> T 

14810 14820 gag 376-405 (26) 14 *so 14860 i487o i48eo 
* * * * | * * 

RGGCAC1IGAAAAGGATTRTCAAGTCCTTC 

YCCCTGgCTTTTCCTAA YAGTTC AC GAAGTTGACACC TTTCCTTC CGGTAKAGCGATY CTTAACerTCTlGGAGGGGACCTC 
X GX XRIXKC FWCG KEGHXAXHC R P P L E> 

14890 ,14900 14910 rev 76-105 (129) 14940 14950 14960 

AGACTGMACCTGGATTGCTCCGAGGATWGCG R C ACCTCCGGCACACAGCAAAGCC AAGGCA C AGAGACAGGAGTGGGjJcT 
TCTCJlCXTCCACCTAAIXAgCCTCCTAWCCC^ 
R LXX.DCSEDXXT5CTQQSQCTETG V C L> 

14970 14980 14990 15000 pol 781-810 (86) 15030 15040 

CGTtSGCTGTGCATCTGGCCAGCGX^TATATCGAAGCC 

GCACCGACACCTAC^CCGGTCGCCTATATAG Cl J I J tACTAGGGACGGCTTTGACCTGTCCTTTGGCGAATGAAAK 

VAVHVASGY IEA EVIPAETGQ E T A Y P> 

15050 15060 1507 0 1508 0 15090 eny 200-229 (150) 15120 

I* * * * • . * . ' * 

TCCTCAA<feTTARGCCJGTGGTX^VG^ 

AGGAGTTOTAATYCGGACACCAGTC GTGTGTCX^AGGACGAGTTGCCATCGGAGCGAC TTC TCCT T YACY AATAGTCTTCG 
X L K 1 I XPVVSTQLL LMGSLABBB XX I RS> 

15130 15140 15150 15160 15170 DO) 406-435 (61) 15200 

* | • * * # *^ " " » 

GAAAACYTTACCRATAA^AACTGGTCGGCAAACTGAATTGGG 
CTPTTGRAATGGYTATTQITTGACCAGCCGTrTGACTT 
E N X T X m'x LVGKLHWASQIYXGIKVXQl>> 

15210 15220 15230 15240 15250 env 121-139 (145) 15«0 

* * * *** 

GTCTAAGCTCCTGAGAGGCPrCAAAGCOCTC 
CACATTCGAGGACTCTCCGYCGTTTCGdc^^ 

C KLLRGX Xa'ltPLCVTLHCTNANLIN> 



a 8 **"*" 1S31 ? 1S "? l531 ° tat 76-102 (123) 



15360 



TGAATCCTGC1 

acttxccacg; 

V N I A A 



TGAA^GCTGCI CAAMCCAGAGGCGATAACCCTACCCRTCCCRAAGAGTCCAAGAAARAGGTCGM 



[CGACCjjcTTKGC^rCTCCGCTATTGGGATGGC^ 

QXRG DNPTX PXESXRXVX SKXET> 
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spacers 



15390 



GACCCTTKTGAC 



GCCGO 



CT66GAAHACTC|C( 
D P X D 

15450 



A A 



15400 rev 61-90 (128) 



15430 



15440 



'CCAiKTTKTCTGGGAAGGyCTCCCGAACCCGTCCCCCTCCAGCTCCCCCCTCTGGA 
lGGTKGAMA^ACCCTTCCRGACGGCTTGGGCACCGCGAGGTCGAGGGGGGAGACCT 
PSSXXL6RXAEPVPLQLPPI*E> 



15460 



15470 



AAGGCTCMACCTCGACTGTAGCGAAGA 
TTCCGAGKTGGAGCTGACATCGCTTCTGWCACy^ 
R L X L D C S 



CGAAGACWGTGHOGMAI 
GCTTCTGWCACYdcKTI 
E D X X 1 X 



15480 15490 15500 15510 15520 

♦ » * * * 

CTGGATAAGTCGGCCTCCTTGTGGAA 
TGACCTATTCACCCGGAGGGACACCTTGACCAAGYTATAGWGGT 
X.DXWASLWNWFXXX> 



C5 
join 
C6 

i 



env 450-479 (166) « 55 J 15560 

ASTGGCTGTGGTACATTAAGATTTTCATTATGATTGT© 



15570 



15580 



15590 



15600 




ATAAGATTGTCAGGATGTACYMACCTGTCTCCATC 
TSACC GACACtTATITrAATTCTAAAAGTAATACTAACAC CCTCCC|TTATTCTAACAGTCCTACATGRXTGGACAGAGGT AG 
XWI.WYIKIFIMIVCG , WKIVRMYXPVS I> 



15640 



15650 



15660 



15670 



15680 



3561 ? gag 271-300 (19) # , . 

CTCGACATTARGCAAGCCCCTAAGGAACCCTTCAGGGATTA^ 
GAGCICTAATYCCTTCCGGGATTC C TT GG GAAGTCCCTAAT^ 
L D I XQGPKEPFR DYVDRF AXL. L W K G E G> 



15690 



15700 po| 946-975 (97) 15730 



15740 



15750 



15760 



AGCCGTCGTGATTCAGGACAACTCCGACATTAAGGTCGTGCCCAGGAGAAAGGCTA^ 
TCGGCAGCACTAAGTCCTtnTGAGGCTGTAArriWA 

A VV IQDMSDIXVVPRRKAKI I E L N K R> 



15770 



15780 



15790 po| 226-255 (49) 



15820 



spacers 



CCC AAGACTTTTGGGAAGTGCAACTGGGA ATCCCTCACCtTGCTGGACTGAAAAAGAAAAAGTCCGTGACAGTt GCCGC1 
GGGTTCTGAAAACCCTTCACGTTGACCCTTAGGGAGTtX^ CCGCGJ 
TQDPWEVQLGI FH PAGLKKKK SVT V | A Aa 



15B50 15860 15870 15880 en V 1-30 (137) 



15910 



15920 



ATGAGAGTGAAAGAGACACAGATGAACTGGCCCAATCTGTGGARGTGGGGCACA 
TACTCTCAL 1 1 IC ICTGTGTCTACTTGACCGGGTTAGACACCTYCACCCCGTGT^ 
MRVKErOMKWPWLWXWGTXILGXVXIO 



15930 15940 15950 15960 15970 po! 421-450 (62) 



16000 



CTCCGCCTC< ATTAAGGTCARACAGCTCTGCAAACTGCTCAGGGGTRCAAACGCTCT^^ 
GAGGCGGAGC TAATTCCAGTYTGTCGAGACGTTTGACGA<7I*X^ 

SAS 1 IXVXQX*CKLCRGXKALTXIVXLT> 

16010 16020 16030 16040 16050 net 181-196 (191) 160B0 

* » • * 

iWAltX^GTTTGACTCCCRCCTCGCCCKGAGACATATSGCCA 
GTWTACCTTC AAACTGAGGCYGGAGCGGQKrrCTGTATA SCGGTCCCTTGACGY AGGG 
EL'LXWK F D5XLAXRKXA R E 1. X P> 




16160 



TCCTTCGGC 
E E A B L 

16090 . spacers 16120 

GAGTWCTACAAAGACPGC GCTGCI IH^GAGCTCCTGGGACRCTCCAGCCTCARGGGACTGCRAAG 
CTCAWGATGTTTCTGACqCGACG^GCTOGAGGA^ 
A A 



E X Y X D C 

16170 16180 



16130 env 570-599 (174) 



V ELLGXSSLX 
16190 16200 16210 



G L X 
16220 



R G W E X L> 
16230 16240 



CAAGTATTKGKGGAACCTCCTGCWGTA1 
GTTCATAAMCMCCTTGGAGGACGWCAT/ 

KYXXNLLXYWG 



gCTGGRGCAACTGCAAYCTGCTCTGMAAACCGGAWCAGAGG CB 

|gaccycgttgacgttrgacgagacktttggcct*k3tctcc join 

SSLXQLQXALXTGXE> QJ 



16300 



16310 



16320 



gag 61-90 (5) 16270 . 1628 °. 16290 

aactgargtctctgtwtaacacartcgctaccct^ 
ttgactycagggacaittticfgtyagcgatgggagaccacaca 

elxslxntxatlwcvhq'elyky kvvxi> 
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16330 env 270-299 (154) 



16360 



16370 



16380 



16390 



16400 



RAACCCCTCGGCRTTGCCCCTACCARAGCCAAAAGGAGAGTG 
YTTGGG*3*GCCGYAACGGGCATGGTYTCG G TTTre 
X P L G X A P T X AKRRVVXRE K R 1 L T X I VX L> 



16460 



16470 



16480 



i64io 16420 pol 436-465 (63) 1645 ? 

CACCGAAGAGGCTCAGCTGGA«rrGG^ 
GTGCCTTCTaraACTCGACCTCGACCK ^^ 

V » D XT 



T E E A E 



E 



WHEILXEPVHGVY 



V L> 



1*490 16500 16510 gag 361-390 (25) 1654 J 



16550 



16560 

CCGAAGCCATGAGCCAAGYC AMCMA TGCCAACATCATGATGCAGAGAGGC AATTTCA RA GGCCMAAAGAG AATCRTC AA<a| 

ggctrcggtactccgttcrgtkgxtacggttctact 

abamsoxxxamimmqrgmfxgxkrixkJ 

16570 16580 16590 16600 |>ef 61 -90 (183) 16630 16640 

CAAGAGGAAGAGGRGGTCGGCTICCCCGTCAGGCCTC 
tfniritlCrTCTCCYCCACCCGAAG^ 
QEEBXVGP PVRPQVP LRPMTYKXAXD I*> 

16650 16660 16670 16680 16690 aaa 286-31 5 (20) 16720 

GTCCYTCTTOARACAGGGACCCAAAGAGCCTTTCAGAGAC^ 
CACGRACAAgrYTGTCCCTGGGTTTCTCGCAAAGTCTC^ 

SXF'XQG PK EPPRDYVDRPXKTLRAE Q> 

16730 16740 16750 16760 16770 H^jIC t*>\ 16800 



3AWGTGAAAAAOTGGGAG 



gag 16-45 (2) 



CCWCACAGGAWGTGAAAAAiflTGGGA GAAAATC AGACTGAGACCTG G TGCCAAAAAGA AATACARAIffGAAACAClgPTGTG 

GG ^ rGTCCTreA Civrrrtj Acc c^ ^ 

A XQXVKN WB K I R L R PGGKK K Y X X K H X V> 
16810 16820 16830 16840 16850 „i e«_c-7E rT7\ 16880 



pol 646-675 (77) 



TGGGCCTrcAGGGAACTGGAAAGGTTTGCtrrCCCA 
fXCCGGAGGTCCCTTGACCrrrCCAAACG<}AG«^^ 
WAS R E L E RPa'sQYALG IIXAQP OX SB S> 



16890 



16900 



16910 



16920 



16930 



16940 



16950 



16960 



CGAGSTCCTGARTCAGATTATCGAAVAGCTC ATCAAGAA KGGACAGACAGARTCATTGAGGTCG 
GCTCSAGCACTIACTCTAATAGCTTBTCGAGTAG^ 

EXVXQI IEX LI Kx'lAVAXXTDR XI EV> 



env 615-644 (177) ***** ™™ 170 *° 



17020 



17030 



17040 



C7 



YCCAAAGGGCTKGGAGAGCCATOTTGMATATCCCCASGAGAATCAGAC^ 
RGGTTTCCCGAWCCTCTCGGTAAGACKTATAGGG Join 
XQRAXRA I I* X I PXRI RQTRLAGRWPV X> £8 



17050 



pol 811-840(88) 



17110 




R 7 AATCCAT ACCGAT AACGGAAGC AATTTCACAAGCR CTRCCGTC A 

YP*TrAGGTATGGCTATTGCCTTCGTTAAAGTGTTCt»Y GA YGGC AGTTCCGACGGACG ACC AC 
X IHTDNG SNPTSXXVK AACWWA'DVXQ L> 



17130 17140 poj 511.540 (68) 



17170 



17180 



17190 spacers 



CACCGHfcGTCGTCCAGAAARTCGCYACCGAA GCTt 
GTGGCKTCRGCAGGTCrTTYAGCt^'1' GGi .Tt'l' t-G 'l AACACTAT ACCC C TTTC TGlT jG GTTC AACT YTGACGGATAI CGAC 
TXXVQXXATES1VIHGK TPKFX LP I | A> 



*P acefS Bglll EcoRI 



CxfccCAGCAACGAGAACATGGASRCCAT( GCTGCT TG/|AGATCT|GAATT(jGCC 
GC CGG TCGTTGC t CI JG T ACCTSYGGTAC CGACG/ ACTTTCTAG? CTPAAC CGG 
A_AS»EMMXXM | A A | J * R S B F A> 

Flu NP epi (Mouse) Stop 
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10 



20 



30 



40 



50 



60 



70 



60 



CCGCCTAGGTGGTACTGTCCGGGAACGTKTTTGCAGTCG 
G.GSTMT6PCXNVSXVQCTHGIXPVVST> 



90 



100 110 120 130 140 150 160 

****** * 

TCCCTCARAAGCCTCTWCAATACCICTCGCCACAC 



Q L LONGS L X S L X N T X ATLWCVHQ R I X> 

. 170 180 190 200 210 220 230 240 

• • * * * * - * * 

VXDTKEALDKI E L G D G G G A X R Q G T S S S> 

250 260 270 280 290 300 310 320 



YTCARCTTTCCACAAftTQAC^CTCT 

X XFPQITLWQ-RPLVTEPFRXXNPXMV I> 

330 340 350 360 370 380 390 400 

******* • 

TTACC^TACA TCGA C^ 

YQYMDDLYVGSDLEXGQHFTTPDKK Hi> 
410 420 430 440 450 460 470 480 

AAAAGGA 




TAATGGGAGACCGTCGCAGGGGAGCACTGTyAGTTrTAGCCGCCTGTCGAGTWTC 
I TLWQRPLVTX KIGGQLX EALLOTG S X> 



570 



580 



590 



600 



610 



620 



630 



640 



ACCGTCTrf LTTIX^ATCCGTTGCATCTS CGCG AGGAGTCTCGTCKYTCCTAGTGGTTATGGGATAG RG ACTCGTTGGGG 
G RKXR RQRRXA PQSXXDHQYPIXEQ P> 



650 



660 



670 



680 



690 



700 



710 



720 



AGRGGAAGAAATCrcTTTTGGACCGAAAGGKCGTTCCAYTTCGGTCTCTCAAA^ 

L X FFR E N L A PXQGXAREFX5 EQTXAN S> 



730 



740 



750 76C 770 780 790 800 

* * * * * 

CGAAAGCTCCG YCRTTCTGGG A Y CTGGCACCAAAAACGCCGCT ACTAG 




E P A> 
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Input parent 
polypeptide sequence (s) 



Add alanine spacers to the ends of 
each polypeptide sequence for processing 



Optional 



Fragment polypeptide sequence (s) into 
fragments (e.g., 30 aa) which are 
preferably overlapping (e.g., by 15 aa) 



Reverse translate the fragments to provide 
a nucleic acid sequence for each fragment 



Scramble or randomly rearranged 
the fragments 



Yes 



Have any fragments been placed 
together to recreate at least a portion 
^pf the parent polypeptide sequence? 



No 



Link the rearranged fragments together 
to create a synthetic polypeptide ' sequence 
and/or a synthetic polynucleotide sequence 



t 



Output the synthetic polypeptide sequence 
and/or the synthetic polynucleotide sequence 
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r Scramble 7 95/21 6 

I* Includes */ 

include <stdk>.h> 
include <stdlib.h> 
include <string.h> 
#indude <time.h> 

r Constant definitions */ 

P Version Information 7 

#define VERSION_NO "0.2" 
#define VERSION_DATE "04/03/1 999" 

TMisc7 

#define KEYBOARD_BUFFER_SIZE 256 /*size of keyboard read buffer 7 

#define LENCODON 4 /length of codon (including 

null) 7 

#define BUFFER_SIZE 10000 /*size of file read buffer 7 

define TRUE 1 /-boolean true 7 

#define FALSE 0 /"boolean false 7 

/* Error codes 7 

#define EJMOERROR 0 /*no error 7 

#define E_NOINFILE 1 Tgenes file not found 7 

#define EMALLOC 2 /"memory allocation error 7 

#define E_FILEREAD 3 /Tile read error 7 

#define E C REATE O UTP UT_FI L E 4 Terror creating output file 7 

#define E_OVERLAP 5 /^segment overlap >= length 

P Structure definitions 7 

typedef struct gene GENE; 

typedef GENE # P_GENE; 

typedef struct gene_segment GENE_SEGMENT; 

typedef G EN E_SEGMENT * P_GENE_SEGMENT; 

struct gene { 

char * name; 

char * data; 

P_GENE next_gene; 

}» 

struct gene segment { 

PGENE pjgene; 
bit number; 
int offset; 

int first_codon_choice; 
char * amino_data; 
char * dna data; 
P_GENE_SEGMENT next_seg; 

}• 
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r Function prototypes */ 96/216 

int prolog(); 

int get_parameters(); 

int read jnt(char * prompt); 

int foadjgenes(); 

int add_gene(char * gene_name,char * gene_data); 

void insert_gene(P_GENE * head,P_GENE newjgene); 

int add_aa(); 

int split jgenes(); 

int splitjgene(P_GENE g); 

int insert_segment(P_GENE_SEGMENT * head_seg,P_GENE_SEGMENT new_seg); 
int con vert_segments_aa_to_dna() ; 

int convert_aaJo_dna(char * aa_ptr,char * dnajrtr.int first_choice); 

char * codon(char acid_char jnt preferred); 

int perform_scramble(); 

int scram ble_segments(); 

int adjacent_segments(); 

int display_genes(); 

int write_output_file(); 

void strip_newline(char * strip_str); 

void pad_amino_string(char * amino_ptr, char * padded_ptr); 

int even(int test_num); 

void read_str(char * prompt.char * string); 

char * read_nonblankJine(char * buf.int buf_size,FILE * injile); 

int user_confirmation(); 

void test(); 

/* Global variables */ 

char * codon_table[26][2] = { 
/•AOOVrGCCVGCT"}, 

r-01 */ cvrrrrn* 

TC02 VfTGC-.TGT}, 
/*D03 VfGACYGAr}. 
/* E 04 */ fGAGYGAA"}, 

rf osv crrc m rrrr}, 

r G 06 7 fGGC YGGA"}, 
/*H07 VrCAC- ( w CA"r}. 
n 08 */ {"ATCVATT"}, 

09 */r??r. •???-}, 

/* K 10 7 {"AAGVAAA"}, 
TL11 VfCTGYCTC"}, 
r M 12 */ fATGYATG"}, 
r N 13 V {"AACVAAT"}, 

rpisvccccvccT"}, 

r Q 16 */ fCAG'/'CAA"}, 
/" R 17 7 {"AGGVAGA"}, 
/*S187rAGCYTCCl, 
rT^VrACC-.-ACA"}, 

/*- 20 vr??r/7?r} ( 

/* V 21 7 {"GTG*,"GTC^ t 
/*W22 VfTGG-.TGG"), 
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r- 23 vcrrrmr). 

r Y 24 V {TAC\TA-n}, 97/216 

r- 25 vnrr rnr\ 
}; 

char * errorJextQ = { 

roov™ 

r 01 */ ."ERROR: Input file not found!" 

r 02 7 ."ERROR: Memory allocation error 

r 03 7 ."ERROR: File read error" 

r 04 V ."ERROR: Could not create output file" 

r 05 V ."ERROR: Segment overlap must be less than segment length" 

}» 

char disease_name[KEYBOARD_BUFFER_SIZE]; 
char input Jfle_name[KEYBOARD_BUFFER_SIZE]; 
char output_file_name[KEYBOARD_BUFFER_SIZE]; 
int num jgenes = 0; 
int num_segments = 0; 
int lensegment; 
int segmentoverlap; 
P_GENE firstjgene = NULL; 
P_GENE_SEGMENT first_segment = NULL; 
P_GENE_SEGMENT * scrambled_segments = NULL; 

r Mainline */ 

void main() { 

int error = E_NOERROR; 

printfTScramble - Version %s. %s\n\n",VERSION J^O.VERSIONDATE); 

r Initial processing V 
if (terror) 

error = prolog(); 

/* Get various program parameters from user */ 
if (.'error) 

error = get_parameters(); 

r Load genes from genes file */ 
if (terror) 

error = loadjgenes(); 

/* Add *AA f to start and end of all genes */ 
if (terror) 

error = add_aa(); 

r Split genes into overlapping chunks 7 
if(lerror) 

error = split _genes(); 

F Convert segment amino acid to dna 7 
if(lerror) 

error = convert_segments_aa_to_dna(); 
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} 



r Scramble the segments 7 98/216 
if (lerror) 

error = perform_scramble{); 

r Write output file */ 
if (lerror) 

error = write_output_file(); 

r Show error if there was one 7 
if (error) 

printff^s^.errorjextferror]); 



rprolog()7 

F Perform any initial processing required 7 



int prolog() { 



} 



/* Seed the random number generator, using the system clock */ 
r Don* run the program more than once in the same second! 7 
r Or well get the same randomisation!!!!!!!!!!!!!!!!!!!!! 7 
srand(time(NULL)); 

return EJMOERROR; 



r get_parameters() 7 

/• Ask for various parameters from the user (stdin) 7 
I* Disease name 7 
r Input file name 7 
F Output file name 7 
P Segment length 7 

int get_parameters() { 
int valid; 

read_strCEnter disease name : ".diseasejiame); 
read_str("Enter input file name : *Jnput_file_name); 
read_str("Enter output file name : w ,output_fite_name); 

valid = FALSE; 
while (!valid){ 

len_segment = read^intfEnter segment length : "); 

if (len_segment % 2) 

prbitfTSegment length must be even!\n w ); 



valid = TRUE; 

} 

segment_overtap = len_segment / 2; 
return E^NOERROR; 



/* toadjgenesQ 7 
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r Load the genes from the input file 7 99/216 

int loadjgenes() { 

FILE * input J\\e\ 
char name j>uf[BUFFER_SIZE]; 
char data_buflBUFFER_SIZE]; 
int rc; 

r Open genes file for reading 7 
if (NULL == (input_file = fopenfinputJilejriame.V))) 
return ENOINFILE; 

printffLoading genes from: %s\n^nputjile_name); 

numjjenes = 0; 

r Read gene name */ 

while (NULL != read^nblankJinetname^buf.BUFFER^SIZEjnpuLfile)) { 
r Read the gene data 7 

if (NULL != read_nonblankJine(data_buf,BUFFER_SI2E l input_file)) { 
r Allocate memory for new gene and add to list 7 
if (rc = add_gene(name__buf 1 data_buf)) 
break; 

} 

} 

/* Close genes file */ 
fclose(input_file); 

return rc; 

} 

r add jgene() 7 

r Allocate memory for new gene, then insert in list 7 

int add_gene(char * genename, char * gene_data) { 
P_GENE newjgene; 

r Allocate storage for new gene 7 

if (NULL = (new_gene = mailoc(sizeof(GENE)))) 

return EMALLOC; 
r Initialise new gene 7 
new_gene->nextjgene = NULL; 
r Allocate storage for gene name (+1 for null) 7 
if (NULL = (new_gene->name = malloc(strien(gene_name)+1))) 

return E_MALLOC; 
/* Store gene name 7 
strcpy(newjgene->name,genejiame); 
/* Allocate storage for gene data (+1 for null) 7 
if (NULL — (new_gene->data = malloc(str1en(gene_data>+1 ))) 

return EMALLOC; 
r Store gene data 7 
sfrcpy(new_gene->data,gene_data); 
r Insert the new gene into linked list 7 
insertjgene(&firstjgene t new jgene); 
r Increment numjgenes 7 
numjgenes++; 
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return E_NOERROR; 1 00/21 6 

} 

P insert_gene() 7 

r Insert gene into finked list 7 

void insert_gene(P_GENE # headjgene,P_GENE new_gene) { 
P_GENE * cur_ptr = headjgene; 

while (NULL != (*cur j)tr)) 

cur _ptr = &(( # cur_ptr)->next_gene); 

*cur_ptr = new_gene; 

} 

/*add__aa()*/ 

/* Add *AA' to the start and end of every gene 7 

int add_aa0 { 

PGENE cur_jgene = firstjgene; 
char * new_data; 

while (NULL != cur_gene) { 

r Allocate storage to fit the gene plus four characters */ 

newdata = ma)loc(strten(cur_gene->data)+5); 

P Shift gene data to new storage, add *AA" 7 

strcpyfnewjdata/AA*); 

strcat(new_data,curjgene->data); 

strcat(new_data,"AA*); 

I* Free previous gene data storage 7 

free(cur_gene->data); 

r Set gene data pointer to new storage 7 

cur_gene->data = new data; 

r Advance to next gene 7 

curjjene = cur_gene->next_gene; 



return E_NOERROR; 

} 

/* splitjgenes() 7 

/* Split the genes Into overlapping segments 7 

int split_genes() { 

P_GENE curjjene = firstjgene; 
P_GENE_SEGMENT cur_seg = first_segment; 



printf(*Splitting genes into segments.. An"); 

r Split the genes into segments 7 

while (NULL ?= curjgene) { 
r Split the gene 7 
splitjgene(cur_gene); 
r Advance to next gene 7 
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curjjene = cur_gene->next_gene; 

} 

r Count the number of segments 7 
num_segments = 0; 
cur_seg = first_segment; 
while (NULL !=cur_seg){ 

num_segments++; 

curseg = cur_seg->next_seg; 

} 

return E_NOERROR; 



r split_gene{) 7 

/* Split a gene into overlapping segments */ 

int split jgene(P_GENE g){ 
char * seg_ptr; 
char * seg_buf; 

P_GENE_SEGMENT new_segment = NULL; 

int done; 

int seg_ctr = 0; 

r Allocate memory for segment buffer */ 
if (NULL == (seg_buf = malloc(len_j>egment+1 ))) 
return E_MALLOC; 

r Insert a null at the end of the segment buffer, 7 
/* so we can use it as a string 7 
seg_bufpen_segment] = W; 

P Set segment pointer to start of gene data 7 
seg_ptr = g->data; 

done = FALSE; 
whHe(!(done)){ 

r So we know if we copied data 7 

seg_buffl)] = W; 

r Copy a segment of gene data to the segment buffer */ 
memcpy(segjt>uf,segjptrjen_segment); 

r If there was some gene data copied to the buffer */ 
if (NULL!=seg_buf[0]){ 

I* Allocate storage for a new segment */ 

if (NULL == (new_segment = malloc(sizeof (GEN E_SEGMENT)))) 

return E_MALLOC; 
r Increment segment counter */ 
seg_ctr++; 

t Setup the new segment 7 
new_segment->p_gene =g; 
new_segment->number = seg_ctr; 
new_segment->offset = seg_ptr - g->data + 1 ; i 
new_segment->next_seg = NULL; 
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If (NULL == (new_segment->amino_data = malloc(len_segment+1 ))) 

return E_MALLOC; 
if (NULL — (new_segment->dna_data = malloc(len_segment*3+ 1 ))) 

return E MALLOC; 
new_segment->amino_data[0] = W; 
new_segment->dna_data[0] = W; 
r Copy segment data from buffer to new segment V 
strcpy(new_segment->amino_data,seg_buf); 
r Insert new segment into chain from gene 7 
insert_segment(&first_segment f new_segment); 



/* If we didnt read a full segment, we are finished! 7 
if (strten(seg_buf) < lensegment) 
done = TRUE; 

T Otherwise, advance segment pointer to next segment in buffer 7 
else 

seg ptr = seg j>tr + Ien_segment - segment overlap; 

} 

P insert_segment() 7 

/* Insert a segment node at the end of the list 7 

int insert_segment(P_GENE_SEGMENT * head_seg,P_GEN E_SEGMENT new_seg) { 
P_GENE_SEGMENT * cur_ptr = head_seg; 

while (NULL != (*cur _ptr)) 

cur _ptr = &((*cur_j>tr)->next_seg); 

•curjptr = newseg; 

} 

/* convert__segments_aa_to_dna 7 

f* Go thru segments, and for each, convert amino acids to dna 7 

int convert_segments_aa_to_dna() { 

P_GENE_SEGMENT cur_seg = first_segment; 
int first_choice== 1; 
int alternate; 

printf("Converting to DNA...\n*); 

r Work out if we need to alternate the first codon choice or not 7 
/* Dont need to do this anymore, since the segment length is 7 
I* forced to be even, and the overlap is half the length (odd). 7 
/•alternate = ((even(lensegment) && even(segment_overlap)) 

|| (!even(len_segment) && !even(segment_over1ap)));7 

alternate = FALSE; 

while (NULL 1= cur_seg) { 

cur_seg->first_codon_choice = firstchoice; 
convert_aa_to_dna(cur_seg->amino_data,cur_seg->dna_data, 

cur_s^S->first - codon_choice); 
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r Address next segment 7 
cur_seg = cur_seg->next_seg; 

r If we are alternating, alternate the first codon choice 7 
/•if (alternate) 

if (1 — first_choice) 

first_choice = 2; 

else 

first_choice = 1;7 



return EJMOERROR; 



/* convert_aajo_dna 7 

r Converts a string of amino acid to dna */ 

r NOTE: assumes that buffer at dna _j>tr is large enough to hold dna!!! 7 

int convert_aa_to_dna(char * aa _j>tr,char * dna j>tr f int first_choice) { 
char * p_codon; 
int cur _preferred = first_choice; 

whUe O0 , !=*aa_ptr){ 

p_codon = codon(*aa_ptr,cur_preferred); 

strcat(dna_ptr,p_codon); 

I* If we didnl find a codon, log a warning 7 

if (0 = strcmp(p_codon ( -??7V0-)) 

printffWARNING: no codon found for amino acid!\n"); 

f Alternate current preferred codon 7 
if (1 = cur_preferred) 

curj>referred = 2; 

else 

cur jpreferred = 1; 

aa_ptr++; 



return E NOERROR; 

} 

/•codon 7 

/* Returns a pointer to a codon corresponding to the amino acid passed 7 
r The codon pointer is to 3 characters, plus a terminating null 7 

char * codon(char actd_char,int preferred) { 
int codon Jablejndex; 
char * codonjptr; 

/* Determine index into codon_table (table starts at TV) 7 
codonjable Jndex = atid_char - TV; 

r Set pointer to appropriate codon 7 

codon _ptr = codon_tabte(codon_tabte_index][preferred-1 ]; 1 
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return codon_ptr, 

} 

r display_genes() */ 

r Display the name and data for all genes */ 

int display_genes() { 

P_GENE cur_gene = firstjgene; 

while (NULL != cur _gene) { 

printf("%s\n w t cur_gene->name); 
printf(^\n^curjgene->data); 
curjgene = cur - gene->next_gene; 

} 

return E_NOERROR; 

} 

/* perform_scramble() V 
/* Scramble the segments 7 

r Check for adjacent segments. If there are, rescramble 7 

int perform_scramble() { 

int done = FALSE; 
int re = E_NOERROR; 

while (TRUE){ 

re = scramble_segments(); 
if (E_NOERROR = rc) 

if (adjacent_segments()) { 

printffAdjacent segments detected! Rescramble? (y/n) "); 
if (!user_confirmation()) { 

printffWARNING: Adjacent segments in output 

file.\n*); 

break; 

} 

} 

else 

break; 

else 

break; 

} 

return rc; 

} 

/* scramble_segments() 7 

/* Randomly scramble the segments, putting pointers in scrambled_segments[] 7 

int scramble_segments() { 

P_GENE_SEGMENT cur_seg = first_segment; 
int i j; 

P_GENE_SEGMENT temp; 

i 

printf ("Scrambling segments...\n"); 
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r Allocate storage for array of segment pointers 7 

if (NULL == (scrambled_segments = malloc(sizeof(P_GENE_SEGMENT)7ium_segments))) 
return E_MALLOC; 

/* First, initialise scrambled_segments in same order as linked list */ 
1 = 0; 

while (cur_seg 1= NULL) { 

scram Wed_segments[i] = cur_seg; 
cur_seg = cur_seg->next_seg; 
»++; 

} 

r Now. randomly scramble the segments */ 
for (i=0;i<num_segments;r++) { 

J = rand() % num_segments; 

temp = scrambled__segments[Q; 

scrambiedsegmentsp] = scramb!ed__segments|j]; 

scrambled segmentsfjj = temp; 

} 

return EJMOERROR; 

} 

P adjacent_segments() 7 

r Determine if the scrambled segment order has resulted in */ 
P two segments which were adjacent originally (ie every 7 
r second one) have ended up adjacent. 7 

int adjacent segmentsQ { 
int i; 

int rc = 0; 

P_GENE_SEGMENT cur_seg; 
PGENESEGMENT next_seg; 

for (i=0;t<num_segments^1;i++) { 

F Address current and next segments 7 

cur_seg = scrambled_segments[Q; 

next_seg = scrambled__segments[H-1]; 

r Do segments come from same gene, and are two apart? 7 

if (((cur_seg->p_gene == next_seg->p_jgene) 

&& ((cur_seg->number == (next_seg->number)+2) 

|| (cur_seg->number == (next_seg->number)-2)))) 

return 1; 

} 

return 0; 

} 

T write_output JfleQ 7 

/* Write out segments (in initial non-scrambled order) 7 
/* Write out synthetic protein (in scrambled order) 7 
r Write out synthetic dna (in scrambled order) 7 

int write_output_file() { 1 
FILE'outputJBe; 
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char * aminojHjffer; 
P__GENE_SEGMENT cur_seg; 
int i; 

r Open output file for writing (erase any contents) */ 
If (NULL = (outputfile = fopen(output_file_name,V))) 
return E_CREATE_OUTPUT_FILE; 

r Allocate memory for padded amino string buffer 7 
if (NULL = (amino Jnjffer = malloc(len_segment*3+1 ))) 
return tMALLUC; 

printf("Writing output file: %s\n",output_file_/»ame); 

I* Write output file header information */ 
fprintf(outputJfle, w Scramble %s - Output Fiie\n",VERSION_NO); 
f^rintf(output_tUe t "\n"); 

fj>rintf(output_ffle,*Disease name : %s\n",disease_name); 
fprintf(output_fiJe "Input filename : %s\n",rnput_file_name); 
fprintf(outputJile "Output filename : %s\n",output_file_name); 
fprintf(output_file/Number genes : %d\n",num_jgenes); 
f)3rintf(output_file > N Number segments : %d\n" t num_segments); 
fprintf(output_fDe,"Segment length : %d\n",len_segment); 
fprintf(output_tUe t "Segment overlap : %d\n",segment_over1ap); 

I* Write out segments in initial non-scrambled order */ 
fprintffoutpuMfle/VT); 

f^rintf(output_fBe t "S^ments in original orderAn"); 

fprintf(outputJ3e" \n"); 

curseg = ftrstsegment; 
while (NULL != cur_seg) { 

I* Format amino data to line up with codons 7 

pad_amino_string(cur_seg->amino_data,amino_buffer); 

fprintf(output_file,*Gene : %s\n" t air_seg->pjgene->name); 

fprintf(output_file "Segment* : %d\n\cur_seg->number); 

fprintf(output_ffle,"Offset : %d\n" > cur_seg->offset); 

fprintf(output_ffle "1st Codon : %d\n\c^r_seg->first_codon_choice); 

fprintf(outputjfile, ,, %s\n" f am>no_buffer); 

fprintf(output_file,"%s\n" t cur_seg->dna_data); 

fprintf(outpuWile/Vr); 

cur_seg = cur_seg->next_seg; 

} 

r Write out segment names in scrambled order 7 
fprintf(output_ffle,"Segments in scrambled order\n"); 

fprintf(outputJBe," \n"); 

for (i=0;i<num_segments;i++) { 

F Format amino data to line up with codons 7 

pad_amino_string(scrambled_segments[i}->amino_data 1 amino_buffer); 
r Write segment details */ 

tprintf(output_file/ , %s #%d\n",scramb!ed_segments[i]->pjgene->name, 

scrambled_segments[i]->number); 
fprintf(output_file,"%s\n* t amino_buffer); 
t^rintf(output_file ( "%s\n",scrambled_segments[i]->dna_data); 
fj>rintf(outfHJt_file, , Vi"); 
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} 

r Write synthetic protein in one long string */ 
fprintf(output_file,"Synthetic ProteinrW); 

fprintf(output_file," Vf); 

for (i=0;i<num_segments;H-+) 

fjprintf(output_file,^s* > scrambled_segments[0->amino_data); 

fprintf(output__file I ,, Vn\n"); 

r Write synthetic dna in one long string */ 
fprintf(output_fi»e, w Synthet!C DNArVT); 

farintfCoutputJile,'' \n w ); 

for p=0;i<num_segments;i++) 

fprintf(output_file, ,, %s ,, ,scrambled_segments[i]->dna_data); 

return E_NOERROR; 

} 

I* strip_newtine0 */ 

/* Replace the first newiine character with a null */ 

void strip_newtine(char * strip_str) { 
char * newirne_j>os; 

I* Find the newtine char */ 
newttne_pos = strchr(strip_str,'\n'); 

P If we found one, replace it with a null 7 
if (NULL != newiine _pos) 

newline_pos[0] = W; 

} 

/* pad_amino_string 7 

/* Copy amino chars from amino_ptr to padded_j>tr, padding each */ 
/* side with a space. */ 

void pad_amino_string(char * amino j>tr, char * padded _ptr) { 

while 00* != 'amino _ptr) { 
*padded_ptr = ■ 
padded_ptr++; 
*padded _ptr = *amino _ptr, 
padded_ptr++; 
•padded _j>tr = 1 
padded_ptr-M-; 
amino_j>tr++; 

} 

r Stick a null at the end of the padded string 7 
•padded_ptr = W; 



/*even()7 

/* True if test_num is even, otherwise false 7 
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int even(int test_num) { 

return !(test_num % 2); 



rreadjnt()7 

r Read an integer from stdin. Keep trying until valid int > 0 entered. 7 
r Return the integer read, or 0 if error reading from stdin. 7 

int read_jnt(char * prompt) { 

char buffer[KEYBOARD_BUFFER_S!ZE]; 
int valueread; 
int valid = FALSE; 

while (Ivalid) { 

printfC%s"4)rompt); 
valid = TRUE; 

fgets(buffer,KEYBOARD_BUFFER_S!ZE.stdin); 
if (1 != sscan^buffer^-.&value^ead)) 

valid = FALSE; 
if (valid && (value jread < 1)) 

valid = FALSE; 

if (Ivalfd) 

printffPositive integer value please!\n*); 



return value read; 

} 

r read_str() 7 

P Read a string from the user (stdin) 7 
r Strip the newline from it 7 

void read_str(char * prompt, char * string) { 

char buffer[KEYBOARD_BUFFER_SIZE]; 

printf(prompt); 

fgets(buffer,KEYBOARD_BUFFER_SIZE,stdin); 
sscanf(buffer,"%s\string); 



r read_nonblank_line() 7 

r Read a line from file until we get a non-blank one 7 

char # read_nonblankJine(char * bufjnt buf_size.FILE # injile) { 
char* return _ptr, 

F Read fines until we get a non-black one, or EOF 7 
do 

retum_ptr = fgets(buf,buf_size,in_file); 
while ((NULL != return _ptr) && ((VT = buflO]) || f ' = buflO]))); 

r If we got a line, change the newline char to a null 7 i 
if (NULL != return j>tr) 

strip_newline(buf); 
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return retum_ptr; 

} 

r user_confirmation() 7 

r Read input from user. If user types y, return 1 , otherwise 0 V 

int user_confirmat»on() { 

char bufferfKEYBOARDBUFFERSIZE]; 

fgets(buffer,KEYBOARD_BUFFER_SIZE t stdin); 
if «y == buffer[0J) || (V = buffer[0])) 
return 1; 

else 

return 0; 

} 

/*test()7 

T For debugging/development */ 

void test() { 

char strf 100]; 

printf ("Enter something: 

fgets(str,100,stdin); 

printfnine1\n"); 

printf("%s\str); 

printffline2\n*); 

fgets(str.100,stdin); 

} 
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HepC Savine design 



HepC la consensus polyprotein sequence used for scramble program 
MSTOPIO>QRICTKRNTORRPQDVKFPGGGQI 

PGYPWPLYGNEGCGWAGWIJ^PRGSRPSWGPTDPRRRSRNIX3KVID^^ 

VLEIXJVNYATGNLPGCSFSIFLLTUjLSCLTVPASAYQVR^ 

ASRCWAMTPTVATRIX?KLPATQLRRHIDLLVGSATLC^ 

I TGHRMAWDMMMNWS PTAAI*VMAQLLRI PQAI LDM I AGAHWGVLAG I A YFSMVGNWAKVL VVLLL FAG VDAETHVTGG 
NAGRTTSGLVSIjIiTPGAKQNIQLINTOGSWHINST^^ 
WGPISYANGSGPDQRPYCWHYPPKPOSIVPAKSVTCPVYCOT^ 
GNWFGCTWMNSTGFTKVCGAPPCVIGGAGNNTIJICPTO 

TI FKVRM YVGGVEHRLEAACNWTRGERCDLEDRDRSELS PLLLSTTQWQ VLPCS FTTLP ALSTGL IHLHQNI VD VQ YL 

YGVGSSIASWAIKWEYVVLLFXLLADARVCSCLWMMLLISQAEAAL 

RWVPGAVYALYGMWPLLUJjIJUiPQ 

VWVPPLNVRGGRDAVIIO,MCVVHFniVFDITKLLIAVFGPL^ 
AI IKLGALTGTYVYNHLTPIJUJWAHNGLRDIiAVAVEPVV^ 

ADGMVSKGWRLLAP I TAYAQQTRGLIiGCI I TSLTGRDKNQVEGBVQI VSTAAQTFLATC INGVCWTVYHGAGTRTI AS 
PKGPVIQMYTW7DQDLVGWPAP(X3SRSLT^ 

CPAGHAVGI FRAAVCTRGVAKAVDFI PVENLETTMRSPVFTDNSS PPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQG 

YKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGS PITYSTYGKFLADGGCSGGAYDI I ICDECHSTDATS I 

LG I GTVLDQAETAGARLVVLATATPPGS VTVPHPN I EEVALS TTGE I P FYGKAI PLEVI KGGRHLI FCHS KKKCDELA 

AKLVAIXSINAVAYYRGLDVSVIPTSGDWVVATD^ 

SRTQRRGRTGRGKPGIYRFVAPGBRPSGMFDSSVLCECYDAGCAWY^ 

VFTGLTHIDAHFIiSQTKQSGENFPYLVAYQATVCARAQAPPPSWDQMWKCLIRLKP^ 

HPVTKYIMTCMSADLEVVTSTimjVGGVLAALAAYCI^TC 

YIEQGMMLAEQFKQKAI^IAQTASRQAEVIA^ 

AAVTS PLTTSQTliI»FN I LGGWVAAQIiAAPG AATA FVG AGLAGAAI G SVGLG KVLVD I LAG YGAG VAGALVAFK I M S GE 
VPSTEDLVNLLPAII^PGALVVGWCAAILRRHVGPGEGAVQWMNRLI 
LTVTQLLRRiaQWISSECITPCSGSWIjRDIWDWICEV^ 
CHCX5AEITGHVKNGTMRIVGPRTCRNMWSGTFPINAYTTGPCT 

TDNLKCPCQVPSPEFFTELDGVRLHRFAPPCKPLLREEVS FRVGLHEYPVGSQLPCEPEPDVAVLTSMLTDPSHI TAE 
AAGRRLARGS PP SMAS S S AS QLS APSLKATCTANHDS PDAEL I EANLL WRQEMGGNI TRVESKNKWI LDS FDPLVAE 
EDEREISVPABILRKSRRFAQALPVWARPDYNPPLVETW^ 

STALAEIATKSFGS SSTSGITGDNTTTS SEPAPSGCPPDSDAES YS SMPPLEGEPGDPDLSDGSWSTVSSEAGTED W 
CXZSMSYSWTGALVTPCAAEEQKLPINALSNSIjLRHHNLv^ 
KVKANLI*SVEEACSLTPPHSAKSKFGYGAKDVRCHARKAVAHINSW 
KPARLr^PDLGVRVCBKMALYDWSKLPIAVMGSSYGFQYSPGQRVEFLVQAWKSKCT 

IRTEEAI YQCCDLDPQARVAI KSLTERliWGGPLTNSRGENCGYiyiCRASGVLTTSCGNTLTCYI KARAACRAAGLQD 
CTMLVCGDDLWI CESAGVQEDAASIlRAFTEA^^^R YSAPPGDPPQPE YDLEL I TSCS SNVSVAHDGAGKRVYYLTRDP 
TTPLiARAAWETARHTPVNS WLGNI I M F APTL WARM I LMTHFFS VL I ARDQLEQALDCE I YGACYS I EPLDLP PI IQRL 
HGLSAFSIJISYSPGEINRVAACIaRJOX3VPPLRATO 
IjDI»SGWFTAGYSGGDIYHSVSHARPRWFWFCLLIjI*AAGVGIYIjLPNR 



Scramble - Output Pile 

Scramble version : 0.1 beta, 08/02/1999 

Num. genes : 1 

Hum. segments : 201 

Segment length : 30 

Segment overlap : 15 

Segments in original order: 



Gene : HepCla 

Segments : 1 
Offset : 1 
1st Codon : 1 

A A MS T M P K PQHKTKRNTNRRPQDVKPPGGG 
GCCXXTITVrGTLtJVCCAATCCCAAACC^ 
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Gene : HepCla 

Segment* : 2 
Offset : 16 
1st Cod on : 1 

NTNRRPQDVKFPGGGQIVGGVYLLPRRGPR 
AACACAAACAGAAGGCCTCAGGATGTGAAATTCCC^^ 



Gene : HepCla 

Segment# : 3 
Offset : 31 
1st Codon : 1 

QIVGGVYLLPRRGPRLGVRATRKTSERSQP 
CAGATTGTGGGAGGCGTCTACXrrCCTGCCTAG^ 



Gene : HepCla 

Segment # : 4 
Offset : 46 
1st Codon : 1 

LGVRATRKTSBRSQPRGRRQ PIPKARRPEG 
CTGGGAGTGAGAGTCACAAGGAAAACCTCCGAGAGAAGCO 



Gene : HepCla 

Segment # : 5 
Offset : 61 
1st Codon : 1 

RGRRQPI PKARRPEGRTWAQPGYPWPLYGN 
AGGGGAAGGAGACAGCXTTATCCCTAAGGCTAGGAGACCCGAAGGC^ 



Gene : HepCla 

Segments : 6 
Offset : 76 
1st Codon : 1 

RTWAQPGYPWPLYGN 
AGGACAT 



EGCGWAGWLLSPRGS 



Gene : HepCla 

Segment # ; 7 
Offset : 91 
1st Codon : 1 

EGCGWAGWLLSPRGSRPS 
GAGGGAT 



GPTDPRRRSRN 
ACCCACAGACCCTAGGAGAAGGTCCAGGAAT 



Gene : HepCla 

Segments : 8 
Offset : 106 
1st Codon : 1 

RPSWGPTDPRRRSRNLGKVIDTLTCGFADL 
AGGCCTAGCTGGGGCCCTACCGATCCCAGAA^ 




YI PLVGAPLGGAA 



CTGGGAAAGGTCATCGA' 



tTCTGATGGGCTATAT 



Gene : HepCla 

Segments : 10 
Offset : 136 
1st Codon : 1 

MGYI PLVGAPLGGAARALAHGVRVLBDGVN 
ATGGGATACATTCXXCTCGTCGGAGCCCCTCTGGGA^ 



Gene : HepCla 

Segment* : 11 
Offset : 151 
1st Codon : 1 

RALAHGVRVLBDGVNYATGNLPGCSFSI PL 
AGGGCTCTGGCTCACGGAGTGAGAGTGCTCGAGGATC 



Gene : HepCla 

Segments : 12 
Offset : 166 
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1st Codon : 1 

YATGNLPGCSFSIFLLALLSCLTVPASAYQ 
TAOGCIACCGGAAACCTCCCCGGATGCItAr r i'L'ltJCATCT 



Segments : 13 
Offset : 181 
1st Codon : 1 

LALLSCLTVPASAYQVRNSTGLYHVTNDCP 
CTGGCTCTGCTCAGCTGTCTGACAGTGCCTG^ 

Gene : HepCla 

Segments : 14 
Off Bet : 196 
1st Codon : 1 

VRNSTGLYHVTNDCPNSS IVYEAADAILHT 
GTGAGAAACTCCACCGGACTGTATCACGTCACCAATGACTGTC 

Gene : HepCla 

Segment^ : 15 
Offset : 211 
1st Codon : 1 

N S S I V Y EAADAILHTPGCVPCVREGNASRC 
AACTCCAGCATTGTGTATGAGGCTGCCGATGCCA 

Gene : HepCla 

Segaent# : 16 
Offset : 226 
1st Codon : 1 

PGCVPCVRBGNASRCWVAMT PTVATRDGKI* 
CCX^GATGCGTCCCCTGTGTGAGAGAGGGAAACGC^ 

Gene ; HepCla 

Segments : 17 
Offset : 241 
1st Codon : 1 

HVAMTPTVATRDGKLPATQLRRHIDLLVGS 
TGGGTCGCCATGACCCCTAOCGTCGCCACAAGGGATG 

Gene : HepCla 

Segments : 18 
Offset : 256 
1st Codon : 1 

PATQLRRHIDLLVGSATLCSALYVGDLCGS 
LVCt^rACCCAACTGAGAAGGCATATCGATCT^^ 

Gene : HepCla 

Segments : 19 
Offset : 271 
1st Codon : 1 

ATLCSALYVGDLCGSVPLVGQLPTFSPRRH 
GCCACACTGTGTAGCGCTClXnATGTGGGAG 

Gene : HepCla 

Segments : 20 
Offset : 286 
1st Codon : 1 

VFLVGQLFTFSPRRHWTTQGCNCSIYPGHI 
GTGTr TC TGGTOG GCCA ACTGTTTACCTTTAGC^ 

Gene : HepCla 

Segments : 21 
Offset : 301 
1st Codon : 1 

WTTQGCNCSI YPGHITGHRMAWDHMMNWS P 
TGGACAACCCAAGGCTCTAACrGTAGCATTTACCCT G 

Gene : HepCla 

Segments : 22 
Offset : 316 
1st Codon : 1 

TGHRHAWDHMMNWS PTAALVHAQLLRI P Q A 
ACCGGACACAGAAlXjUCriXA^TATGATGATGAATT^ 



Gene 



: HepCla 
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Gene : HepCla 

Segment* : 23 
Offset : 331 
1st Codon : 1 

TAALVHAQLLRI P Q A I LDMIAGAHWGVLAG 
ACCGCTGCCCTOGTGATCGCCCyUiCTGCTC 

Gene : HepCla 

Segment # : 24 
Offset : 346 
1st Codon : 1 

ILDMIAGAHWGVLAGIAYPSMVGNWAKVLV 
ATCCTCGACATGATCGCTGGCGCTCACICGGG 

Gene : HepCla 

Segment* : 25 
Offset : 361 
1st Codon : 1 

IAYFSMVGNWAKVLVVLLLFAGVDAETHVT 
ATCGCTTACTTTAGCATGGTGGGAAACTGGGCCAAAG 

Gene : HepCla 

Segment* : 26 
Offset : 376 
1st Codon : 1 

VLLLFAGVDABTHVTGGNAGRTTSGLVSLL 
GTGCTCCrGCTCTTCGCIt3GCGTCGAC^ 

Gene ; HepCla 

Segment* : 27 
Offset : 391 
1st Codon : 1 

GGNAGRTTSGLVS LLTPGAKQNI QLINTNG 
GGCGGAAACGCTGGCAGAACX^CAftGCGGACTGGTC^ 

Gene : HepCla 

Segment* : 28 
Offset : 406 
1st Codon : 1 

TPGAKQWIQLINTNGSWHIHSTALNCNESL 
ACCCCTGGCGCTAAGCAAAACATTCAGCTCATC^ 

Gene : HepCla 

Segment* : 29 

Offset : 421 

1st Codon : 1 

SWH INSTALNCNE SLHTGWLAGIiFYQHKFN 
AGCTGG-CACATTAACTCCACCGCrCTGAAT^ 

Gene : HepCla 

Segment* : 30 
Offset : 436 
1st Codon : 1 

NTGNLAGLFYQHKFNSSGCPBRLASCRRLT 
AACACAGGCTGCCTGGCTGGCtrrCTTCT^ 

Gene : HepCla 

Segment* : 31 
Offset : 451 
1st Codon : 1 

SSGCPBRLASCRRLTDFDQGWGPISYANGS 
AGCTCCGGCTGTCCCGAAAGGCTCGCCTCCTGCAGAAGGCT^ 



Gene : HepCla 

Segment* : 32 
Offset : 466 
1st Codon : 1 

DFDQGWGPISYANGSGPDQRPYCWHYPPKP 
tTCTCXTTACCCTAACGGAAGCGGACCCGATCAGAGACCCT 



Gene 
Segment* 



HepCla 
33 
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Offset : 481 
1st Codon : 1 

GP D Q R P Y C W HYPPKPCGIVPAKSVCGPVYC 
GGCCCltiACCAAAGGCCITACTGTTGGCftTnvC^ 

Gene : HepCla 

Segment # : 34 
Offset : 496 
1st Codon : 1 

CGIVPAKSVCGPVYCFTPSPVVVGTTDRSG 
TGCGGAATCGTOrCGCTAACTCCGTG^ 

Gene : HepCla 

Segments : 35 

Offset : 511 

1st Codon : 1 

P T P S P V V V G TTDRSGAPTYSWGAHDTDVFV 

TTCACAcxcTccccantxn^^ 

Gene : HepCla 

Segment # : 36 
Offset : 526 
1st Codon : 1 

APTYSWGANDTDVPVLNNTR PPLGNHFGCT 
GCCXrCTACCTATAGCTGGGGCGCTAACGATAGC^ 

Gene : HepCla 

Segments : 37 
Offset : S41 
1st Codon : 1 

L N N T R PPJ* G NWFGCTWMNSTGPTKVCGAPP 
CTGAATAACACAAGGCCTCCCCTCGGCAATTGGTT^ 

Gene : HepCla 

Segment # : 38 
Offset : S56 
1st Codon : 1 

WMNSTGFTKVCGAPPCVIGGAGNNTLHCPT 
TGGATGAACTCCACXXSGATTCACAAAGGTCTGCGGAGCCC^ 

Gene : HepCla 

Segments : 39 
Offset : 571 
1st Codon : 1 

C V I G G A G HNTLHCPTDCFRKHPBATYSRCG 
^GCGTCATCGGAGGCGCTGGCAATAACACACTGCATTGCC^ 

Gene : HepCla 

Segment^ : 40 
Offset : 586 
1st Codon : 1 

P C F R K H P B ATYSRCGSGPWITPRCLVDYPY 
GACTGTTTCAGAAAGCATCCC^AAGCCACATACTC^ 

Gene : HepCla 

Segments : 41 
Offset : 601 
1st Codon : 1 

SGPWITPR C L V D YPYRLWHYPCTIHYTIFK 
AGCGGACCCreGATCACACCCAGATGCXntXr 

Gene : HepCla 

Segments : 42 
Offset : 616 
1st Codon : 1 

Rl> W H Y PC TINYTI FKVRMYVGGVBHRLRAA 
AGGCTCTGGCATTACCCTTGCACAATCAATTACACA 

Gene : HepCla 

Segments : 43 
Offset : 631 
1st Codon : 1 

VRMYVGGVBHRLSAACNNTRGBRCDLBDRD 
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GTGAGAATGTATGTGGGAGGCGTCGAGCATAGGCTCGA^ 

Gene : HepCla 

Segments : 44 
Offset : 646 
1st Cod on : 1 

CNWTRGBRCDLBDRDRSBLSPLLLSTTQWQ 
TGCAATTGGACAAGGGGfcGAGAGATGCG^ 

Gene : HepCla 

Segment* : 45 
Offset : 661 
1st Codon : 1 

RSELSPLLLSTTQWQVLPCSFTTLPALSTG 
AGGTCCGAGCTCAGCCCTCTGCTCCTGTCCA 

Gene : HepCla 

Segment* : 46 
Offset : 676 
1st Codon : 1 

VLPCS FTTlfPALSTGLIHLHQNIVDVQYLY 
GTGCTCCCCTGTAGCTTTACCAC»CTG^ 

Gene : HepCla 

Segments : 47 
Offset : 691 
1st Codon : 1 

LIHLHQNIVDVQYLYGVGSSIASWAIKWBY 
CTGATTCACCTCCACCAAAACATTGTGGATCTGCAATAC^ 

» 

Gene : HepCla 

Segment^ : 48 
Offset : 706 
1st Codon : 1 

GVGSS IASWAIK->WEYVVLLFI*LLADARVCS 
GGCGTC^GCrCCAGCATTGCCTCCTGGGCT 

Gene : HepCla 

Segments : 49 
Offset : 721 
1st Codon : 1 

VVLLPLLLADARVCSCLWMMLLISQABAAL 
GTGGTCC1GCTCTTCCTCCTGCTCGCCGATGCCAGA 

Gene : HepCla 

Segments : 50 
Offset : 736 
1st Codon : 1 

C L W M M j L ISQABAALBNLVILNAASLAGTH 
TGCCTCTGGATGATCCTCCTGATTAGCCAAGCCG 

Gene : HepCla 

Segments : 51 
Offset : 751 
1st Codon : 1 

EHLVI LHAASLAGTHGLVSPLVPPCFAWYL 
GAGAATCTGGTCATCCrOUUXXriT^CTCC^ 

Gene : HepCla 

Segments : 52 
Offset : 766 
1st Codon : 1 

GLVS PLVFPCFAWYLKGRWVPGAVYALYGM 
GGCCTXCTGTCCTTCCItXnCTT^ 

Gene : HepCla 

Segments : 53 
Offset : 781 
1st Codon : 1 

KGRWVPGAVYALYGMWPLLLLLLALPQRAY 
AAGGGAAGGTGGGTGCCTGGCGCltnCTATGCCCTCT 

Gene : HepCla 
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Segment * : 54 
Offset : 796 
1st Codon : 1 
WPLLLLLLAL 



PQRAYALDTBVAASCGGVVL 
VTACCGAAGTGGCTGCCTCCTGCGGAGGCGTCGTGCTC 



Gene : HepCla 

Segment* : 55 
Offset : 811 
1st Codon : 1 

ALDTBVAASCGGVVLVGI*MALTLS PYYKRY 
GCCCrCGACACAGAGGTCGCCGCTAGCTGTGGCGG^ 

Gene : HepCla 

Segment * : 56 
Offset : 826 
1st Codon : 1 

V G I# M ALT LSPYYKRYI SWCLWWLQYFLTRV 
GTGGGACTGATGGCCCTCAaxrrCAGCCCTTAC^ 

Gene : HepCla 

Segment* : 57 
Offset : 841 
1st Codon : 1 

ISWCLWWLQYFLTRVBAQLHVWVPPLHVRG 
ATCTCCTGGTGTCTGTGGTGGCTCCAGTATTTC 

Gene : HepCla 

Segment* : 58 
Offset : 856 
1st Codon : 1 

BAQLMVHVPPL W V RGGRDAVILLMCVVHPT 
GAGGCTCAGCTCCACGTCTGGGTCCCC^^ 

Gene ; HepCla 

Segment* : 59 
Offset : 871 
1st Codon : 1 

GRDAVILLMCVVHPTLVPDITKLLLAVPGP 
QGCAGAGACGCTGTGATTCTGCTCATijltyiVXt X riX^ C^ 



: HepCla 
Segments : 60 
Offset : 886 

1st Codon : 1 

LVFDXTKLLLAVFG PLNI LQASLLKVPYFV 
CTGGTCTTCGATATCAC3MUVGCTLC1^ 

Gene : HepCla 

Segment* : 61 
Offset : 901 
1st Codon : 1 

LW ILQASLLKVPYPVRVQGLLR ICALARKM 
Cltn^GGATCCTCCAGGCTAGCCTCCTGAAAGTGCCTTACTl'l 

Gene : HepCla 

Segment* : 62 
Offset : 916 
1st Codon : 1 

R V 0 G L L R 1 C ALARKMIGGHYVQMAI I K L G A 
AGGGTCCAGGGACTGCTCAGGATTTGCGCTCTGGClTVGGn^ 

Gene : HepCla 

Segment* : 63 
Offset : 931 

1st Codon : 1 

I G G H Y V Q MAI I KLGALTGTYVYNHLTP L*RD 
ATCGGAGGCC^TTACGTCCAGATGGCCATTATCAAAC^ 



Gene 
Segment* 
Offset 
1st Codon 



HepCla 
64 
946 
1 
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L T G T Y . Y_ Y WHLTPLRDWAHNGLRDLAVAVBP 
CTGACAGGCACATACGTCTACAATCACCTCACCCCTC^ 



Segment* : 65 
Offset : 961 
1st Codon : 1 

WAHNGLRDLAVAVBPVVFSQMETKLITWGA 
TGGGCTCACAATGGCCTCAGGGATCTOXnrGTG^ 

Gene : HepCla 

Segment* : 66 
Offset : 976 
1st Codon : 1 

VVPSQMBTKLITWGADTAACGDI INGLPVS 
GTGGTCTTCTCCCAGATGGAGACAAAGCTCATCACATC 

Gene : HepCla 

Segment* : 67 
Offset : 991 
1st Codon : 1 

DTAACGDI INGLPVSARRGREILLG PADGM 
GACACAGCCGCTTGCGGAGACATTATCAATGGCCTCOCCGTC^ 

Gene s HepCla 

Segment* : 68 
Offset : 1006 
1st Codon : 1 

ARRGRBILLGPADGMVSKGHRLLAPITAYA 
GCOW3AAGGGGAAGGGAAATCCTCCTGGGACCCGCTGACGGAAT 

Gene : HepCla 

Segment* : 69 
Offset : 1021 
1st Codon : l 

VSK GWRLLAPITAYAQQTRGLLGCI ITSLT 
GTGTCCAAGGGATGGAGACTGCITXXXOT 

Gene : HepCla 

Segment* : 70 
Offset : 1036 
1st Codon : 1 

Q Q T R G L L G CIITSLTGRDKHQVBGBVQIVS 
CAGCAAACXIAGAGGCXriXX^TGGGATGCATTA 

Gene : HepCla 

Segment* : 71 
Offset : 1051 
1st Codon : l 

GRDKHQVBGBVQI VSTAAQTPLATC I N G V C 
GGCAGAGACAAAAACCAAGTGGAAGGaSAAGTGCAAATCGT^^ 

Gene : HepCla 

Segment* : 72 
Offset : 1066 
1st Codon : 1 

TAAQTFLATCINGVCWTVYHGAGTRTIASP 
ACCGCTGCCCAAACClTlVlt^^ACCTCTATCAATG 

Gene : HepCla 

Segment* : 73 
Offset : 1081 
1st Codon : 1 

W T V Y H G AG TRTIASPKG PV IQMYTNVDQDL 
TGGACAGTGTATCACGGAGCCGGAACCAGAACCATTG^ 

Gene : HepCla 

Segment* : 74 
Offset : 1096 
1st Codon : 1 

K G P V I QMYTNVDQDLVGWpAPQGSR SLTPC 

AAGGGftCCCCTCATCCAAATGTATACCAATGTGGATCAGGATCT^ 



Gene 



: HepCla 
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Gene : HepCla 

Segment* : 75 
Offset : 1111 
1st Codon : 1 

VGWPAPQGSRSLTPCTCGSSDLYLVTRHAD 
GTGGGATGGCCTGCCCCTCAGGGAAGCAGAAGCCTCArc 

Gene : HepCla 

Segment* : 76 
Offset : 1126 
1st Codon : 1 

TCGSSDLYLVTRHADVI PVRRRGDSRGSLL 
ACCTGTGGCTCCAGCGATCTGTATCTGGTCACCAGA 

Gene : HepCla 

Segment* : 77 
Offset : 1141 
1st Codon : 1 

VI PVRRRGDSRGSLLS PRPISYLKGSSGGP 
GTGATTCCCGTCAGGAGAAGGGGAGACTTCAGGGGAAGCCTC 

Gene : HepCla 

Segment* : 78 
Offset : 1156 
1st Codon : 1 

SPRPISYLKGSSGGPLLCPAGHAVG IPRAA 
AGCCCIAGGCCTATCTCCTACCTCAAGGGAAGCTC^ 

Gene : HepCla 

Segment* : 79 
Offset : 1171 
1st Codon : 1 

LLC PAGHAVGI FRAAVCTRGVAKAVDP I P V 
CTGCTCTGCCCTGCOGGACACGCrGTGGGAAT^^ 

Gene : HepCla 

Segment* : 80 
Offset : 1186 
1st Codon : 1 

V C T R G V A K AVDFI PVENLBTTMRS PVPTDN 
GTGTGTACCAGAGC^^iXA^AAACCCGTC^ 

Gene : HepCla 

Segment* : 81 
Offset : 1201 
1st Codon : 1 

BHLBTTMRSPVFTDNSSPPAVPQS FQVAHL 
GAGAATCTGGAAACCACAATGAGAAGCrCTGTGTTTACC^ 

Gene : HepCla 

Segment* : 82 
Offset : 1216 
1st Codon : 1 

S S P P A V P 0 S P QVAHLHAPTGSGKSTKVPAA 
AGCTCCCCCCCTGCtXTTCCCCCA^ 

Gene : HepCla 

Segment* : 83 
Offset : 1231 
1st Codon : 1 

HAPTGSGKSTKVPAAYAAQGYKVLVLNPSV 
CACGCTCCCACAGGCTCCGGCAAAAGCACAAAGGTC^ 

Gene : HepCla 

Segment* : 84 
Offset : 1246 

1st Codon : l 

YAAQGYKVLVLNPSVAATLGFGAYMSKAHG 
TACGCTGCCCAAGGCTATAAGGTCCTGCrreCT 

Gene : HepCla 

Segment* : 85 
Offset : 1261 
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1st Codon : 1 
A A T L 6 



GAYNSKAH6IDPNIRTGVRTITT6 
ftTCAGCAAAGCCCATGGC^TTGACCCTA^ 



Segments 
Offset 
1st Codon 
I D p 



HepCla 
86 

1276 
1 

N I R T 



GVRTITTG 



PITYSTYGKFLADG 



ATCGATCCCAATATCAGAACCXXiAGTGAGAACCATTACC^ 



Gene 
Segment* 
Offset 
1st Codon 



HepCla 
87 

1291 
1 



SPITYSTYGKP 
AGCCCTATCACATACTCCACCTATGGCAAAT 



LADGGCSGGAYDII ICDBC 
^TGGCGGATGCTCCGGGGGAGCCTATGACATTATCATTT^ 



Gene : HepCla 

Segment* : 88 
Offset : 1306 
1st Codon : 1 

G C S GGAYDI IICDECHSTDATS I L G IGTVL 
GGCTCTAGCGGAGGOGCITACGATATCATTATCTGTGAOGAATC 

Gene : HepCla 

Segment # : 89 
Offset : 1321 
1st Codon : 1 

HSTDATS ILGIGTVIiDQABTAGARIjVVLAT 
C^CTTCX^CCGATGCXACAAGCATTCTGGGAATCT 

Gene : HepCla 

Segment* : 90 
Offset : 1336 
1st Codon : 1 

DQAETAGARLVVLATATP PGSVTVPHPN IE 
^CCAAGCCGAAACCGCTGGCGCTAGGCTCGTGGTC^ 

Gene : HepCla 

Segment* : 91 
Offset : 1351 
1st Codon : 1 
ATPPGSVTVPHPN 



IBBVALSTTG 



E I P F Y G K 
VTCCCTTTCTATGGCAAA 



Gene : HepCla 

Segment* : 92 
Offset : 1366 
1st Codon : 1 

BVALSTTGBIPPYGKAIPLEVIKGGRHLIF 
GAGGTCGCCCTCAGCACAACCGGAGAGATTCCCTTTTACG 

Gene : HepCla 

Segment* : 93 
Offset : 1381 
1st Codon : 1 

AI PLBVI KGGRHL I F CHSKKKCDELAAK LV 
GCCATTCCCCTCGAGGTCATCAAAGGOGGAAGGCATCTGATT^ 



Gene 
Segment* 
Offset 
1st Codon 
CHS 



HepCla 
94 

1396 
1 

KKKCDBLAAKLV 



TGCCATAGCAAAAAGAAATGCGATGAGCTCGCCGCTAAGCTCGTGG 

Gene : HepCla 

Segment* : 95 
Offset : 1411 
1st Codon : 1 

A L G INAVAYYRGLDVSVI PT 
GCCCTCGGCATTAACC^'ltntAUJTl'ACrATA^ 



ALG INAVAYYRGLDV 



TATTACAGAGGCCTOGACGTC 



GDVVVVATD 
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Gene : HepCla 

Segments : 96 
Offset : 1426 
1st Codon : 1 

SVIPTSGDVVVVATDALMTGYTGDFDSVID 
AGCGTCATCCCTACCTCCGGCGATGTGGTCGTG^ 

Gene : HepCla 

Segments : 97 
Offset : 1441 

1st Codon : 1 

ALMTGYTGDPDSVIDCNTCV TQTVDPS LDP 
GCXTCTCATGACAGGCTATACCXXSAGACTTTGAC^ 

Gene : HepCla 

Segments : 98 
Offset : 1456 
1st Codon : 1 

CNTC VTQTVDPSI»D PT FT I ETTTLPQDAV S 
TGCAATACCTGTGTGACACAGACAGTGGATTTCTC^ 

Gene : HepCla 

Segment^ : 99 
Offset : 1471 
1st Codon : 1 

T F TIBTTTLPQDAVSRTQRRGRTGRGKPGI 
ACCTTTACCATTGAGACAACCACACrGCCTC^ 

Gene : HepCla 

Segments : 100 
Offset : I486 
1st Codon : 1 

RTQRRGRTGRGKPGIYRFVAPGERPSGMFD 
AGGACACAGAGAAGGGGAAGGACAGGCAGAGGCAAACCCGGAATCT^ 

Gene : HepCla 

Segments : 101 
Offset : 1501 
1st Codon : 1 

YRFVAPG BRPSGMFDSSVLCBCYDAGCAWY 
TACAGATTCGTOGCCCCTGGCGAAAGGCC1AGOGGAATGTTTGA 

Gene : HepCla 

Segments : 102 
Offset : 1516 
1st Codon : 1 

SSVLCBCYDAGCAWYBLTPABTTVRLRAYM 

Gene : HepCla 

Segments : 103 
Offset : 1531 
1st Codon : 1 

BLTPABTTVRLRAYMNTPGLPVCQDHLBFW 
GAGCTCACCCCTGCCGAAAC^ACAGTGAGACTCA^ 

Gene : HepCla 

Segments : 104 
Offset : 1546 
1st Codon : 1 

HTPGLPVCQDHLBFWBGVFTGLTHIDAHFL 
AACACACCCGGACTGCCTGTGTGTCAGGATCACCT 

Gene : HepCla 

Segments : 105 
Offset : 1561 
1st Codon : 1 

B G V FTGLTHIDAHFLSQT KQSGBNF PYLVA 
GAGGGAG'iUrriACCGGACTGACACACATTGACGCTCACT r rC 

Gene : HepCla 

Segments : 106 



Figure 26 (Cont) 



WO 01/090197 



PCT/AU01/00622 



121/216 

Offset : 1576 
1st Codon : 1 

SQTK.QSGENFPYLVAYQATVCARAQAPPPS 
AGCCAAACCAAACAGTCCX3GCGAAAACTTTCCCT 

Gene : HepCla 

Segment* : 107 
Offset : 1591 
1st Codon : 1 

YQATVCARAQAPPPSWDQHWKCLIRtiKPTL 
TACC^AGCCACAGTGTGTGCCAGAGCCCAAGCCCCTCCCCCT 

Gene : HepCla 

Segment** : 108 
Offset : 1606 
1st Codon : 1 

WDQMWKCLIRLKPTLHGPTPLLYRLGAVQN 
TCGGATCAGATGTGGAAATCCCTCATCAGACTGAAACCC^ 

Gene : HepCla 

Segments : 109 
Offset : 1621 
1st Codon : 1 

HGPTPLLYRLGAVQNEVTLTHPVTKYIMTC 
CACGGACCCACACCCXrTCCTGTATAGGCTCGGCGCTGTC 

Gene : HepCla 

Segment # : 110 
Offset : 1636 
1st Codon : 1 

BVTLTHPVTXYIMTCMSADLEVVTSTWVLV 
GAGGTCACCCTCACCCATCCCGTCACCAAATACA^ 

Gene : HepCla 

Segment # : 111 
Offset : 1651 
1st Codon : 1 

MSADLBVVTSTWVLVGGVLAALAAYCLSTG 
ATGTCCGCCGATCTGGAAGTGGTCACCTCCACCTO^^ 

Gene : HepCla 

Segments : 112 
Offset : 1666 
1st Codon : 1 

G GVLAALAAYCLSTGCVVIVGRIVLSGKPA 
GGtt3»GTGCTCGCCGCTCTGGCTGCCTO 

Gene : HepCla 

Segments : 113 
Offset : 16B1 
1st Codon : 1 

CVVIVGRIVLSGKPAI I PDREVLYRBFDEM 
TGOGTCGTGATTGTGGGAAGGATTGT GC TCAGOGGAAAGCCTC 

Gene : HepCla 

Segments : 114 
Offset : 1696 
1st Codon : 1 

II PDRBVLYRBPDBMBBCSQHLPYI BQGMH 
ATCATTCCCGATAGGGAAGTGCTCTACAGAGAGTTTGACGAAATG 

Gene : HepCla 

Segments : 115 
Offset : 1711 
1st Codon : 1 

EBCSQHLPYI BQGHMLAEQPKQKALGLLQT 
GAGGAATGCTCCXAGCATCIGCCTTACATTGAGCAA^ 

Gene : HepCla 

Segments : 116 
Offset : 1726 
1st Codon : 1 

LABQPKQKALGLLQTASRQABVIAPAVQTN 
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CTGGCTGAGCAATTCAAAC 



ATCGCTCCCGCTGTGCAAACCAAT 



Gene : HepCla 

Segments : 117 
Offset : 1741 
1st Codon : 1 

ASRQABVI 
GCCTCCAGGCAAGCCGAAGTGAT 



APAVQTNWQKLEVPWAK 



H M W N P 

VTATGTGGAACTTT 



Gene : HepCla 

Segments : 118 
Offset : 1756 
1st Codon : 1 

WQKLBVFWAKHMWHFISGIQYLAGLSTLPG 
TGGCAAAAGCTCGAGGTCTICTGGGCCAAACACATGTGGAA 

Gene : HepCla 

Segments : 119 
Offset : 1771 
1st Codon : 1 

ISGIQYLAG 
ATCTCCGGCATTCAGTAT 



LSTLPGNPAIASLMAFTAAVT 
ICACTGCCTGGCAATCCCGCTATCGCTAGCC^ 



Gene : HepCla 

Segments : 120 
Offset : 1786 
1st Codon : 1 

NPAIAS LHA PTAAVTSPLTTS Q T L I* P N ILG 
AACCCTGCCATTGCCTCCCTGATGGCCTTTACCGCTGCC^ 

Gene 
Segments 
Offset 
1st Codon 




Gene : HepCla 

Segments : 122 
Offset : 1816 
1st Codon : 1 

GWVAAQLAAPGAATAFVGAGLAGAAIGSVG 
GGCTGGGTGGCTCCCCAACroGCItSCCXXTG 



Gene : HepCla 

Segments : 123 
Offset : 1831 
1st Codon : 1 

PVGAGLAGAAIGSVGLG 
TrCGTCGGCGCTGGCCTCGCCGGAGCCGCTAT 



K V I» V 



DILAGYGAG 
\TATCCTCGCCGGATACGGAGCCGGA 



Gene : HepCla 

Segments : 124 
Offset : 1846 
1st Codon : 1 

LGKVLVDILAGYGAGVAGALVAFKI HSGBV 
CTGGGAAAGGTCCTGGTCGACATTCTGGCTGGCrATGGCGC^ 

Gene : HepCla 

Segments : 125 
Offset : 1861 
1st Codon : 1 

VAGALVAPKIMSGBVPSTBDLVNLLPAILS 
GTGGCTGGCGCfLUtXplllX X ICUTl' A AGATT A TGTCCG G 



: HepCla 
Segments : 126 
Offset : 1876 
1st Codon : 1 

PSTBDLVHLLPA1LSPGALVVGVVCAAILR 
CCCTCCACCGAAGACCTCGTGAATCTGCTCCCCGCTA 



Gene 



HepCla 
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Segments : 127 
Offset : 1891 
1st Codon : 1 

PGALVVGVVCAAILRRHVGPGEGAVQWMNR 

Gene : HepCla 

Segments s 128 
Offset : 1906 
1st Codon : 1 

RHVGPGBGAVQWMNRLIAFASRGNHVS PTH 
AGGCATGTGGGACCCGGAGAGGGAGCCGTCC^VGTGGATGA^ 

Gene : HepCla 

Segments : 129 
Offset : 1921 
1st Codon : 1 

LIAPASRGNHVSPTHYVPESDAAARVTAI L 
CTGATTGCCTTTGCCTCCAGGGGAAACCATGTGTC 

Gene : HepCla 

Segments : 130 
Offset : 1936 
1st Codon : 1 

YV PKSDAAARVTAI I»SS LTVTQLLRRLHQW 
TAOntXXXXSAAAGCGATGCCGCTGCCAG^ 

Gene : HepCla 

Segments : 131 
Offset : 1951 
1st Codon : 1 

SSLTVTQLLRRLHQHI SSECTTPCSGSWLR 
AGCTCCCIX^CAGTGACACAGCTCCTGAGAAGGCT 

Gene : HepCla 

Segments : 132 
Offset : 1966 
1st Codon : 1 

ISSBCTTPCSGSWLRDIWDWICBVLSD F K T 
ATCTCCAXSCGAATCCACAACCOCTT CC TC 

Gene : HepCla 

Segments : 133 
Offset : 1981 
1st Codon : 1 

DIWDWI CBVLSDFKTWLKAKLMPQLPG 1 PF 
GACATTTGGGATTGGATTTGCGAAGTGCTCAGCGATTTO 

Gene : HepCla 

Segments : 134 
Offset : 1996 
1st Codon : 1 

WLKAKLMPQLPG I PFVS CQRGYKGVWRGDG 
TGGCTCAAGGCTAAGCTCATGCCTCAGCrCCCC^ 

Gene : HepCla 

Segments : 135 
Offset : 2011 
1st Codon : 1 

VS CQRGYKGVWRGDG IMHTRCHCGABI TGH 
GTGTCCTGra^AGGGGATACAAAGGCGTCTGGAGAGGCGA 

Gene : HepCla 

Segments : 136 
Offset : 2026 
1st Codon : 1 

IMHTRCHCGABITGHVKNGTHRI 
ATCATGCACACAAGGTGTCACTCTGGCGCTGAGATTA 



Gene * HepCla 

Segments : 137 

OffBet : 2041 

1st Codon : 1 
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VXNGTMRIVGPRTCRNMWSGTFPINAYTTG 
GTGAAAAACGGAACCATGAGGATTGTGGGACCCAGAACCn^ 

Gene ; HepCla 

Segment # : 138 
Offset : 2056 
let Codon : 1 

NMWSGTFP INAYTTGPCTPLPAPNYTFALW 
AACATtn^GTraSGCACATTCCCTATCAATGCCTATACCA 

Gene : HepCla 

Segment # : 139 
Offset : 2071 
1st Codon : l 

PCTPLPAPNYTFALWRVSAEEYVEIRRVGD 
CCCTCITVCCCCTCTGCCTGCCCCTAA^ 



: HepCla 
Segments : 140 
Offset : 2086 
1st Codon : 1 

RVSAEBYVEIRRVGDFHYVTGMTTDNLKCP 
AGGGTCAGCtXTTGAGGAATACGTCGAGATTAGGAGA^ 

Gene : HepCla 

Segments : 141 
Offset : 2101 
1st Codon : 1 

PHYVTGMTTDNLKCPCQVPSPBFFT ELDGV 
TTCCATTACGTCACCGGAATGACAACCGATAACC^ 

Gene : HepCla 

Segment ft : 142 
Offset : 2116 
1st Codon : 1 

CQVPSPEPFTELDGVRLHRFAPPCKPLLRE 
TGCCAAGTGCCTAGCCCTGA GTrrri 

Gene : HepCla 

Segment # : 143 
Offset : 2131 
1st Codon : 1 

RLHRFAPPCKPI.LREBVSFRVGLHBYPVGS 
AGGCTCCACAGATTCGCTCCCCCTTGCAAACCCCTCCT& 

Gene r HepCla 

Segment # : 144 

Offset : 2146 
1st Codon : 1 

BVS PRVGLHBY PVGSQLPCE PEPDVAVLTS 
GAGGTCAGCTTTAGGGTCGGCCTCCACGAATACC^ 

Gene : HepCla 

Segment # : 145 
Offset : 2161 
1st Codon : 1 

Q -_ P ° B P B P D V A v I»TSMLTDPSHITABAAGR 
CAG<TTtXXXTGTGAGCCTGAGCCTG 

Gene : HepCla 

Segments : 146 
Offset : 2176 
1st Codon : 1 

M L T D _ P _ S __ H ITAKAA GRRLARGSPPSMASSSA 
ATGCTCACOGATCCCTCCCACATTACCGCTGAG 

Gene : HepCla 

Segment # : 147 
Offset : 2191 
1st Codon : 1 

RLARGSPPSMASSSASQLSAPSLKATCTAN 
AGGCTCGCCAGAGGCTCCCCCCCrAGC AT GG C CTCCAGCT 
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Gene = HepOa l25/ ^ 

Segments : 148 
Offset : 2206 
1st Codon : 1 

S Q I* S A PSLKATCTANHDS PDAELIEANLLW 
AGCO^ACTGTCCGCCCCTAGCCTCAAGGCTACC^ 

Gene : HepCla 

Segment^ : 149 
Offset : 2221 
1st Codon : 1 

H D S PD ABLIBANLLWRQBMGGNITRVESBN 
CACGATAGCCCTGACGCTGAGCTCATCGAA 

Gene : HepCla 

Segments : 150 
Offset : 2236 
1st Codon i 1 

RQEMGGNITRVBSBNKVVILDSFDPLVAEE 
AGGCAAGAGATGGGCGGAAACATTACOU3AGTGGAAA^ 



Gene : HepCla 

Segments : 151 
Offset : 2251 
1st Codon : 1 

KVVILDS FDPL 
AAGGTCGTGATTCTGGATAGCTTTGAC 



V A B B 



DBRBISVPABILRKS 
^TGAGAGAGAGATTAGCGTCCCOGCTGAGATTCTGAGAAAGTCC 



Gene : HepCla 

Segments : 152 
Offset : 2266 

1st Codon : 1 

D B R B I S V 
GACGAAAGGGAAAT 



P A B I L R KSRRFAQALPVWARPDY 
VTCCTCAGGAAAAGCAGAAGGTTTGCCCAAGCCCTC^ 



Gene : HepCla 

153 
2281 
1 

RRFAQALPVWARPDY 
AGGAGATTCGCTCAGGCTCTGCCTGTGTGGGCCA 



Offset 
1st Codon 



N P 



L V B T W K 



DYE 



Gene : HepCla 

Segments : 154 
Offset : 2296 

1st Codon : 1 

HP P L V B TWKKPDYBPPVVHGCPLPPPRSP P 
AACCCrCCCCTCGTGGAAACCTGGAAGAAA 



Gene : HepCla 

Segments : 155 
Offset : 2311 
1st Codon : l 
P V V H G C P 



L P P P R S 



PPVPP PRKKRTVVLTBS 
ntrCGTCCCCCCTCCCAGAAACy^AA^ 



Gene : HepCla 

Segments : 156 
Offset : 2326 
1st Codon : 1 
VPPPRKKRT 



Gene : HepCla 

Segments : 157 
Offset s 2341 
1st Codon : 1 

TL S T J* L ABLATES . P GSSSTSGITGDMTTTS 
ACCCTCAGCAOWGCXCTCGCOGAACTGGC^^ 



: HepCla 
Segments : 158 
Offset : 2356 
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1st Codon : 1 

SSSTSGlTGDNTTTSSBPAPSGCPPDSDAE 
AGCTCCAGCACAAGCX5GAATCACAGGCGATAACACAACCACAAGCT^ 

Gene : HepCla 

Segments : 159 
Offset : 2371 
1st Codon : 1 

SEPAPSGCPPDSDABSYSSMPPLEGBPGDP 
AGCGAACCCGCTCCCTCCGGCTGTCXX:CXrit» 

Gene : HepCla 

Segments : 160 
Offset : 2366 

1st Codon : 1 

SYSSMPPLBGBPGDPDLSDGSHSTVSSEAG 
AGCTATAGCTXTCATGCCTCCCCTCGAGGGAGAGCCTGGC^ 

Gene : HepCla 

Segments : 161 
Offset : 2401 
1st Codon : 1 

DLSDGSWSTVSSBAGTEDVVCCSMSYSWTG 
GACCTCAGCGATGGCTCCTGGTCCACCGTCAGCTCOGAGGCTGGC^ 

Gene : HepCla 

Segments : 162 
Offset : 2416 
1st Codon : 1 

TBDVVCCSMSYSWTGALVTPCAAEEQKLPI 
ACCGAAGACGTCGTGTGTTGCTCCATGTCCrACICCTGGACAGGCG^ 



PIHALSNSLLRHHNLVY 
tfCAATGCCCTCAGCAATAGCCTCCTGAGACACCA 




Gene 

Segments 
Offset 
1st Codon 



HepCla 
164 
2446 
1 



A L S N S L L 



HHHMLVYSTTSRSACQRQKKVT 
VTCACAATCTGGTCTACTCCACCACAAGCAGAAGCGCTTGCCA 



Gene : HepCla 

Segments : 165 
Offset : 2461 
1st Codon : 1 

STTSRSACQRQKKVTFDRLQVLDSHYQDVL 
AGCACAACCTCCAGGTCCGCCItn'CAGAGACAGAAAAAGGTCAC 



Gene 

Segments 
Offset 
1st Codon 
FOR 



HepCla 
166 
2476 
1 

LQVLDSHYQDVLKB 



VRAAASKVKANLL 



TTOGATAGGCTCCAGGTCCTGGATAGCC^TTACCAAGAOGTCCre 

Gene : HepCla 

Segments : 167 
Offset : 2491 
1st Codon : 1 

KBVKAAASKVRA 
AAGGAAGTGAAAGCCGCTGCCTC 



LLSVBBACSLT PPHSAK 



Gene : HepCla 

Segments : 168 
Offset : 2506 
1st Codon : 1 

SVBBACSL T P P H SARSKPGYGAKDVRCHAR 
AGCGTOGAGGAAGCCTGTAGCCTCACCCCTCCCCATAGCGCTAA^ 
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Gene : HepCla 

Segments : 169 
Offset : 2521 
1st Codon : 1 

5KFGY6A KDVRCHARKAVAH INSVWKDLLB 
AGCAAATTCt3GATACCGAGCCAAAGACGTCAGGTGTCACGCT 



Gene : HepCla 

Segments : 170 
Offset : 2536 
1st Codon : 1 

KAVAHINS 
AAGGCTGTGGCTCACA' 



VWKDLLEDSVTP I DTTI MAKN E 
^TCTGCTCGAGGATAGCGTCACCCCTATCXATACCA 



Gene : HepCla 

Segments : 171 
Offset : 2551 
1st Codon : l 

DSVTPIDTTIMAKNB 
GACTCCGTGACACXXATTGACACAACCATTATGGCTAAGAAT 



VPCVQPEKGGRKPAR 



Gene 
Segments 
Offset 
1st Codon 
v F C 



HepCla 
172 
2566 
1 

V Q P E K G G 



FPDLGVRVCEKM 
&CCTCGGCGTCAGGGTCTGCGAAAAGATG 



Gene : HepCla 

Segments : 173 
Offset : 2581 
1st Codon : 1 

LIVFPDLGVRVCBKMALYDVVSKLPLAVMG 
CTGATTGTGTTTCCCGATCTGGGAGTGAGAGTGTGTGAGA 

Gene : HepCla 

Segments : 174 

Offset : 2596 

1st Codon : 1 

A L Y D V V S 
GCCCTCTACGAT 



P L A V 



MGSSYGFQYSPGQRVBF 
VTGGGCTCCAGCTATGGCTTTCAGTATAGCCCTGGCCAAA 



Gene : HepCla 

Segments : 175 
Offset : 2611 
1st Codon : 1 

SSYGFQYS PGQRVBF 
AGCTCCTACGGATTCCAATACTCCCCCGGACAGAGA 

Gene : HepCla 

Segments : 176 
Offset : 2626 
1st Codon : 1 

LVQAWKSKKTPMGFSYDTRCF'DSTVTBSDI 
CTGGTCCAGGCTTGGAAAAGCAAAAAGACACCCAT G 

Gene : HepCla 

Segments : 177 
Offset : 2641 
1st Codon : 1 

YDTRCPOSTVTESDIRTBBAI YQCCDLDPQ 
TACGATACCAGATGCTTTGACTCCACCGTCACCGAAAGCG^ 



Gene 
Segments 
Offset 
1st Codon 
R T B B 



HepCla 
178 
2656 
1 

AIYQCCDLDPQARVA 



KSLTBRLYVG 



AGGACftGAGGAAGCCATTTACCAATGCTGTGACCTCGACCCTC^ 



Gene 
Segments 



HepCla 
179 
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Offset : 2671 

1st Codon : 1 

ARVAIKSLTERLYVGGPLTNSRGBNCGYRR 
GCCIAGAGTGGCTATCAAAAGCCTCACCGAAAGGCTCTACX?rCGGCGGA 

Gene : HepCla 

Segment^ : 180 
Offset : 2686 
1st Codon : 1 

GPLTNSRGBNCGYRRCRASGVLTTSCGNTL 
GGCCCTCIX^OVAACTCCAGGGGAGAG 

Gene : HepCla 

Segment* : 181 
Offset : 2701 
1st Codon : 1 

C R A S G V L TTSCGHTLTCYI KARAACRAAGL 
TGCAGAGCCTCTCGGCGTCCTGACAACCrcCTGCGG 

Gene : HepCla 

Segment* : 182 
Offset : 2716 
1st Codon : 1 

TCY I KARAACRAAGLQD CTMI*VCGDDI#VVI 
ACCTGTTACATTAAGGCTAGGGCTGCCTGTAGGGCTGC^ 

Gene : HepCla 

Segments : 183 
Offset : 2731 
1st Codon : 1 

Q D C T M I* V 
CAGGATTGCACAATGCTCGTl 

Gene : HepCla 

Segments : 184 
Offset : 2746 
1st Codon : 1 
C B S A G V Q 



Gene : HepCla 

Segments : 185 
Offset : 2761 
1st Codon : 1 

FTBAMTRYSAPPGDPPQPBYDLBLITSCSS 
TTCACAGAGGCTATGACAAGGTATAGCGCTtXXXTCT 

Gene : HepCla 

Segments : 186 
Offset : 2776 

1st Codon : 1 

P Q P B YDLBLITSCSSHVSVAHDGAGKRVYY 
CCCCAACCCGAATACGATCTGGAACTGATTACCTCCT^ 

Gene : HepCla 

Segments : 187 
Offset : 2791 
1st Codon : 1 

HVSVAHDGAGKRVYYLTRDPTTPLARAAWB 
AACGTCAGCGTCGCCCATGACGGAGCCGGAAAGAGA^ 

Gene : HepCla 

Segments : 188 
Offset : 2806 
1st Codon : 1 

LTRDPTTPLARAAWETARHTPVNSWLGNII 
CTGACAAGGGATCCCACAACCCCrCTGG^ 

Gene : HepCla 

Segments : 189 
Offset : 2821 
1st Codon : 1 

TARHTPVHSNLGNI I MFAPTLWARMILMTH 



C G D D L V V ICE SAGVQEDAAS LRA 
&TCTGTGAGTCCGCCGGAGTGCAAGAGGATGCCGCTAGCCT 



B D A A 



SLRAFTEAMTRY 
ICCGAAGCCATGACCAGATA 
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ACCGCTAQGCATACCCCTGTGAATAGCTC 



1st Codon : 1 

HFAPTLWARMILWTHFFSVLIARDQLEQAL 
ATGTTTGCCCCTA<XCTCTGGGCTAGGATGATCCTC 



Segment* : 191 
Offset : 2851 
1st Codon : 1 

FFSVLIARDQLBQALDCBIYGACYSIEPLD 
TTCTTTAGCCTCCTGATTGCCAGAGACCAACTGGAAC^ 

Gene : HepCla 

Segment* : 192 
Offset : 2866 
1st Codon : 1 

DCBIYGACYS IEPLDLPP I I QRLHGLSAFS 
GACTGTGAGATTTACGGAGCCn3TTACTCCATC^ 

Gene : HepCla 

Segments : 193 
Offset : 2881 
1st Codon : 1 

LPPIIQRLHGLSAPSLHSYS PGE I N R V A A C 
CTGCCTCCCATTATra^AGGCTCCACGGAC^ 

Gene : HepCla 

Segment* : 194 
Offset : 2896 
1st Codon : 1 

LHSYSPGBIMRVAACLRKLGVPPLRAWRHR 

Gene : HepCla 

Segment* : 195 
Offset : 2911 
1st Codon : 1 

LRKLGVPPLRAWRHRARSVRARLLARGGRA 
CTGAGAAAGCTCGC C GTCCCCCCTCTtj A GAGCCTG 

Gene : HepCla 

Segment* : 196 
Offset : 2926 
1st Codon : 1 

ARSVRARLLARGGRAAICG KYLFNNAVRTK 
GCCAGAAGCGTCAGGGCTAGGCTCCTGCCTAGG 

Gene : HepCla 

segment* : 197 
Offset : 2941 
1st Codon : 1 

A I C G K Y L F HHAVRTKLXLT PIAAAGRLDLS 
GCCATTTGCGGAAAGTATCltyrri'AACTGGGCCCTCAGGACA 

Gene : HepCla 

Segment* : 198 
Offset : 2956 
1st Codon : 1 

LKLTPIAAAGRLDLSGWFTAGYSGGDIYHS 
CTGAAACTGACACCCATTXjCCGCTGCCGGAAGGCTCGACCT 

Gene : HepCla 

Segment* : 199 
Offset : 2971 
1st Codon : 1 

GNFTAGYSGGDIYHSVSHAR PRWFWPCLLL 
GGCTGGTTCACAGCCGGATACTCCGGCGGAGACATTTAC^ 

Gene : HepCla 



Gene 

Segment* 

Offset 



HepCla 

190 

2836 



Gene 



: HepCla 
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Segments : 200 
Offset r 2986 
1st Codon : 1 

VSHARPRWFWFCLLLLAAGVGI YLLPNRAA 
GTGTrcCACGCTAGGCCTAGGTGGlTC^ 

Gene : HepCla 

Segments : 201 
Offset : 3001 
1st Codon : 1 

LAAGVGIYLLPNRAA 
CTGGCTGCCGGAGTGGGAATCTATCTGCT 

Segments in scrambled order; 

HepCla #77 

VIPVRRRGDSRGSLLSPRPI SYLKGSSGGP 
GTGATTCCCGTCAGGAGAAGGGGAGACTCCAGGGGAAGCCTC^^ 

HepCla #68 

ARRGRBILLGPADGMVSKGWRLLAPITAYA 
GCCAGAAGGGGAAGGGAAATCCltXrTGGGACCCGCTG^ 

HepCla #143 

RLHRFAPPCKPLLRBBVSFRVGLHEYPVGS 
AGGCTCCACAGATTCGCTCCCCCTTGCAAACCrc 

HepCla #66 

VVFSQMBTKLITWGADTAACGDIINGLPVS 
GTGGTCTTCTCCO^TGGAGACAAAGCTCATCACA 

KAVDFIPV 
ITGTGGATTTCATTCCCGTC 

HepCla #113 

CVVIVGRIV LSGKPAII PDREVLYRBPDBM 
TGCGTCGTGATTGTGGGAAGGATTGTGCTCAGCGGAAAGC^ 

HepCla #139 

PCTPIjPAPNYTPALWRVSABBYVBI r r v g d 
HepCla #174 

ALYDVVSKLPLAVMGSSYGFQYS pgqrvbf 
GCCCrrCTACGATGTGGTCAGCAAACTGCCTCTGGCT^ 

HepCla #57 

ISWCLWWLQYFLTRVBAQLHVWV pplnvrg 

ATCTCXrrGGTGTCrGTGGTGGCrCCAG^ 

HepCla #51 

BHLVI LNAASLAGTHGLVS FLVF PC FAWYL 
GAGAATCTGGTCATCCTCAACGCTG^CTCCC^ 

HepCla #193 

LPPIIQRLHGLSAFSLHSYSPGE IN R V A A C 
CTCCCrcCCATTATCCAAAGGCTCCaWCGGACT^ 

HepCla #154 
N P P h V 

HepCla #48 

GVGSS I A S W A I KWBYVVLLFLLLADARVCS 
GGCGTCGGCTCC3^GCATTGCCTCCTG^ 

HepCla #37 

LNNTRP PLGNWFGCTWMHSTG FTKVCGAP P 
CIX^TAACAaUUgaxntXCCTCCGC^ 

HepCla #185 

FTEAMTRYSAPPGDPPQPEYDLELI TSCSS 
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ATCACAAGCTGTAGCTCC 
HepCla #54 

WPLLLLLLALPQRAYALDTEVAASCGGVVL 
TGGCCTCTGCrcCTGCTCCTGCTOGCCCTCCCCCA^ 

HepCla #70 

QQTRGLLGCI ITSLTGRDKNQVEGBVQ IVS 
CAGCAAACCAGAGGCCTCCTGGGATGCATTATCACAAGCCTC 

HepCla #82 

SSPPAVPQSFQVAHLHAPTGSGKSTKVPAA 
AGCTCCCXXTCCTGCCGTCrCCCA^ 

HepCla #104 

NTPGLPVCQDHLBFWEGVFTGLTHIDAHFL 
AACACACCCGGACTGCCTGTGTGTCAGGATCACCTCGAGT^ 

HepCla #26 

VLIiLPAGVDABTHVTGGNAGRTTSGLVSLL 
GTGCTCCTGCTCTTCGCTGGCGTCGACGCTGA^ 

HepCla #110 

EVTLTHPVTKYI MTCMSADLBVVTSTKVLV 
GAGGTO^CCCTCACCCATCCCGTCACCAAATACATTATGAC^ 

HepCla #56 

VGLMALTLSPYYKRYI 
GTGGGACTGATGGCCCTCACCCTCAGCCCTTACTATAAGA 

HepCla #197 

A I C G K Y L F MWAVRTKLKLTPIAAAGRLDLS 
GCCATTTGCGGAAAGTATCTGTT/TAACTGGGCCGTC^ 

HepCla #25 

I A Y F S HVGNWAKVLVVLLLFAGVDABTHVT 
ATCGCTTACTTTAGCATGGTGG^SAAACTGGGCCAAft 

HepCla #147 

RLARGSPPSMASS SASQLSAPS LKATCTAN 
AGGOTCGCCAGAGGCTCCCCCCCTAGCATGGCCT^ 

HepCla #52 

GLVSFLVFFCFAWYLKGRWVP 
GGCCTOGTGTCCTTCCTCGTGTTTTTCTGTTT^ 

HepCla #145 

° P B - P ° VAVLTSM1jTDPSHIT ABAAGR 

CAGCTCCXX^TGTGAiXXrrGAGCCT^ 

HepCla #171 

D S V TPIDTTI MAKNBVFCVQPBKGGRKPAR 
GACTCCGTGACACCCATTGACACAACC^TTATGGCTAAGAATGA 

HepCla #84 

YAAQGYKVLVLMPSVAATLGFGAYMSKAHG 
TACGCTGCXX»AGGCTATAAGGTCCTGGTCCTGA^ 

HepCla #14 

V R M S T G 1* Y HVTHDCPNSSIVYBAADAILHT 
GTGAGAAACTCCACCGGACTGTATCACGTCAC 

HepCla #175 

SSYGPQYSPGQRVBFLVQAHKSKKTPMGFS 
AGCTCCTAOGGATTCCAATACTCXXXXGGACA^^ 

HepCla #67 

D T A A C GD1INGLPVSARRGRBI LLGPADGM 
GACACAGCJCGCTTCCGGAGACATTATCAATGGCCTC 

HepCla #148 

S QLSAPSLKATCTAHHDS PDAEL I E A N L L W 
AGCCAACTCTCXXSCCCCTAGCCTCAAG 
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HepCla #120 

HPAIASLMAFTAAVTSPLTTSQTLLFNILG- 
AACCCTGCCATTCCCTCCCTGATGGCCTT1A 

HepCla #176 

L V Q AWKSKKTPMGPSYDTRCFDSTVTBSDI 

CTGGTCCAGGCTTGGAAAAGCAAAAAGACRCCCATGGGCTTTA<X^ 

HepCla #152 

DERBISVPAEI LRKSRRFAQALPVWARPDY 
GACGAAAGGGAAATCTCCGTGCCTGCCGAAATCCrCAGGAAA^ 

HepCla #190 

MFAPTLWARMI LMTHPPSVI>IARDQ L E Q A L 
ATGTTTGCCCCTACCCTCTGGGCTAGGATGATCCTCATGACAC^ 

HepCla #96 

SVI PTSGDVVVVATDALMTGYTGDFDSVID 
AGCGTCATCCCTACCTCXX&JCGATGT^ 

HepCla #94 

CHSKKKCDELAAKLVALGINAVAYY RGLDV 
TGCCATAGCAAAAAGAAATGCGATGAGCTCGCCXKTO 

HepCla #46 

VLPCSFTTLPALST6LIHLHQHIVDVQYZ«y 
GTGCTCCCCTGTAGCTlTACXIACACTGCCTGCCCrCAG 

HepCla #53 

KGRWVPGAVYALYGMWPLLLLLI.AL P Q R A Y 
AAGGGAAGGTGGGTGCCTGGCGCTGTGTATGCCCTCTATO 

HepCla #87 

SPITYSTYGKFLADGGCSGGAYDII ICDBC 
AGCCCTATCACATAC1CCACCTATGGCAAATTCCTCGCCGATGG 

HepCla #196 

ARSVRARLLARGGRAAICGKYLFNWAVRTK 
GCCAGAAGCGTCAGGGCTAGGCTCCrGGCTAGGGGAGGCAGAGCCGCT 

HepCla #170 

KAVAHIHSVWKDLLBDSVTPIDTTI MAKNB 
AAGGCTGTGGCTCACATTAACTCCGTGTTMSAAGG^ 

HepCla #35 

FTPS P V VVGTTDRSGAPTYSWGANDTDVFV 
TTCACACCCTCCCCCGTCGTOH^^ 

HepCla #16 

PGCVPCVREGNASRCWVAMTPTVATRDGKL 
CCCGGATGCGTCCCCTGTGTGAGAGAGGGAAACGCTAGCAGATGC^ 

HepCla #183 

QDCTWLVCGDDLVVICESAGVQBDAASLRA 
CAGGATTGCACAATGCTCGTGTGTGG C GATGACCTCCT^ 

HepCla #125 

VAGALVAFKIMSGBVPSTBDLVN L L P A I L S 
GTGGCTOGCGCTCrGGTCGCCTTTAAGA'^^^ 

HepCla #177 

Y D T R C F DSTVTBSDI RTBBAI YQCCDLDPQ 
TACGATACCAGATGCTTTGACTCCACGGTCACGGAAAGCGA 

HepCla #103 

BLTPABTTVRLRAYMNTPGLPVCQDHLEFW 
GAGCTCACCCCTGCCCAAACCACAGTCAGACTGAGAGTO 

HepCla #186 

P 0 PBYDLBLITSCSSHVSVAHDGAGKRVYY 
CCCCAACCCGAATACGATCTGGAACTGATTACCTCCTGCTCC^ 
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HepCla #9 

LGKVIDTLTCGFADLMGYI PLVGAPLGGAA 
CTGGGAAAGGTCATCGATACCCTCACCTGTGGCTTrTGCOT 

HepCla #93 

AIPLEVIKGGRHLIFCHSKKKCDBLAAKLV 
GCCATTCCCCTCGAGGTCATCAAAGGCGGAAGGCATC^ 

HepCla #112 

GGVLAALAAYCLSTGCVVIVGRIVLSGKPA 
GGCGGAGTGCTCGCOGCTCTGGCTGCCTATrGCCTCA^ 

HepCla #184 

CBSAGVQBDAASLRAFTBAMTRYSA PPGDP 
TGCGAAAGCXXTOG<CTCCAGGAAGACGCTGCCTCOT 

HepCla #199 

GWFTAGYSGGDIYHSVSHARPRWFWFCLLL 
GGCTGGTTCACAGCJCGGATACTCOGGOGGAGACATTTACCATO 

HepCla #158 

SSSTSGITGDNTTTSSBPAPSGCPPDSDAB 
AGCTCCAGCACAAGCGGAATCACAGGCGATAACACAACCACAA^ 

HepCla #100 

RTQRRGRTGRGXPG1YRFVAPGBRPSGHFD 
AGGACACAGAGAAGGGGAAGGACAGGCAGAGGCAAACCXX3GA 

HepCla #43 

VRMYVGGVBHRLBAACNWTRGBRCDLEDRD 
GTGAGAATGTATGTGGGAGGCGTCX3AGCATAGGCTCGAG 

HepCla #58 

EAQLHVWVPPLNVRGGRDAVI LLMCVVH PT 
HepCla #4 

LGVRATRKTSBRS QPRGRRQP I PKARRPBG 
CTGGGAGTGAGAGCCACAAGGAAAACCTCCGAGAGAAGCCAACCC^ 

HepCla #187 

NVSVAHDGAGKRVYYLTRDPTTPLARAAWE 
AACGTCAGCGTCGCCCATGACGGAGCXX3GAAAGAGAGTGTA 

HepCla #159 

5BPAPSGCPPDSDABSYSSNPPLBGBPGDP 
AGOSAACXXXCTXXrCrCCGGCTG^ 

HepCla #63 

IGGHYVQMAI IKLGALTGTYVYNHLTPLRD 
ATCGGAGGCCATTACGTCCAGATGGCCATTATCAAACT^ 

HepCla #126 

PSTBDLVNLLPAI LSPGALVVGVVCAAI L R 
CCCTCCACCGAAGACCTCGTGAATCTGCTCCCCGCTATCC^ 

HepCla #24 

I LDNIAGAHNGVLAGIAYFSMVGNWAKVItV 
ATCCTCGACATGATCXSCTCGCGCTCACrGGGGCGTCCT GG CTG 

HepCla #7 

BGCGWAGWLLSPRGSRPSWGPTDPRRRSRN 
GAGGGATGCGGATGGGCTGGCTGGCTGCTCAG 

HepCla #21 

WTTQGCNCS1YPGHITGHRNAWDMMMNWSP 
TGGACAACCCAAGGCTGTAACTGTM jCA l"rf A CCCTC 

HepCla #17 

NVAMTPTVAT RDGKLPATQLRRHIDLLVGS 
TGGGTCGCCATGACCCCTACCGTCGCCACAAGGGATGGCAA^ 

HepCla #42 
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RLWHYPCTIWYT I P KVRMYVGGVEHRLEAA 
AGGCTCTGGCATTACOTTGCAC^TCAATTACACAATCTTT 

HepCla #172 

V F C VQPEKGGRKPARLIVFPDLGVRVCEKM 
GTGTTTTGCGTCCAGCCTT3AGAAAGGCGGAAGGAAACCrc 

HepCla #10 

MGYIPLVGAPLGGAARALAHGVRVLBDGVN 
ATGGGATACATTCCCCTCGTGGGAGCCCCTCTGGGAGGC^ 

HepCla #27 

GGNAGRTTSGLVS LLTPGAKQN I QLI NTNG 
GGCGGAAACGCTCGCAGAACCACAAGCGGACTGGTCAGCCTCCT^ 

HepCla #13 
L A L L 



HepCla #71 

GRDKNQVEGEVQIVSTAAQTPLATCINGVC 
GGCAGAC^CAAAAACCAAGTGGAAGGOGAAGTGCAAATOGTCAGCACA^ 

HepCla #10 

PATQLRRHIDLLVGSATLCSALYVGDLCGS 
CCCGCTACCCAACTGAGAAGGCATATCGATCTGCTCXnt3GGAA 

HepCla #83 

HAPTGSGKSTKVPAAYAAQGYKVLVLNPSV 
CACGCTCCCACAGGCrcOTOCAAAAGCACAAAGGTCCCCGCTGCCTA^^ 

HepCla #6 

RTHAQPGYPWP LYGNBGCGWAGWLLSPRGS 
HepCla #1$2 

TBDVVCCSMSYSWTGAL 'V TPCAAEEQKLPI 

AccGAAGAcxnxxrixjiurixxnrcATGTarrac^^ 
HepCla #55 

ALDTBVAASCGGVVLVGLMALTLSPYYKRY 
GCCXrrcGACACAGAGGTCGCCGCTAGCTGT 

HepCla #38 

" M H __ S _ TGF TKVCGAPPCVIGGAGNNTLHCPT 
TGGATGAACTCCACCGGATTCACAAAGGTCrGCGGA^ 

HepCla #168 

S V B B A C S L T PPHSAKSKPGYGAKDVRCHAR 
AGCGTOGAGGAAGCCTGTAGCCTCArcCCTCCCCAT^ 

HepCla #119 

I S G I Q YLAGLSTLPGNPAIASLMAPTAAVT 
ATCTCCGGCATTCAGTATCTGGCTGGCCIXZAGCA 

HepCla #3 

0 1 Y-.- G __ V __ Y J*.^ FRRG PRLGVRATRKTSERSQP 

CAGATTGTGGGAGGCGrrCTACCTCCTGCCTAGG 

HepCla #194 

L H S Y SPG BINRVAACLRKLGVP PLRAWRHR 
CTGC^TAGCTATAGCCCTGGCGAAATOIATAQ^ 

HepCla #189 

T A R H T P V NSWLGNI IMPAPTLWARMILMTH 
A(XX3CTAGGCATACrcCTGTGAATAGCTGGCTGGGAAACATTATCATGTT^ 

HepCla #81 

ENLETTMRS P V F T DNSSPPAVPQSFQVAHL 
GAGAATCTGGAAACCACAATGAGAAGCCCTGTGTTTACCGATA^ 

HepCla #91 

ATP PGSVTVPHPNI EBVALSTTGEI PFYGK 
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ATCCCTTTCTATGGCAAA 
HepCla #60 

LVFDITKLLLAVFGPLWI LQASLLKVPYFV 
CFGGTCTTCGATATCACAAAGCTCCTGCTCGC^ 

HepCla #23 

T A A I* V HAQLLRIPQAILDMIAGAHWGVLAG 
ACCGCTGCCCTCGTGATGGCCCAACTGCTCAGGATTC^^ 

HepCla #98 

CNTCVTQTVDFSLDPTFTIETTTLPQDAVS 
TGCAATACCTGTGTGACACAGAQ^GTGGATTTCT 

HepCla #109 

HGPTPLLYRLGAVQNBVTLTHPVTKYI MTC 
CACGGACCCACACOCCTCCTGTATAGGCTCGGCGCTOTGCA 

HepCla #179 

ARVAXXSX.TRRLYVGGPLTNSRGBNCGYRR 
GCCAGAGTGGCTATCAAAAGCCTCACCGAAAGGCTCTO 

HepCla #39 

C V 1 G G NNTLHCPTDCFRKHPEATYSRCG 
TGCGTCATCX3GAGGCX3CTGGCAATAACACACT^ 

HepCla #76 

TCGSSDLYLVTRHADVIPVRRRGDSRGSLL 
ACCTGTGGCTCCAGCGATCTGTATCTGGTCACCAGAC^ 



HepCla #89 

HSTDATSI LGIGTVLDQAETAGARLVV LAT 
CACTCCACCGATGCCACAAGCATTCTCGGAATCGG^ 

HepCla #130 

YVPBSDAAARVTAI LSSLTVTQLLRRLHQW 
TACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGA^ 

DPRRRSRNLGKVIDTLTCGFADL 
ATCCOkGAAGGAGAAGCAGAAACCTCGGCAAAGTGATT^ 

HepCla #33 

GPDQRPYCWHYPPKPCGIVPAKSVCGPVYC 
HepCla #115 

BBCSQH LPYIBQGMMLABQPKQKALGLLQT 
GAGGAATGCTCCCAGCATCTGOCTTACATTGAGCAAGGCATGATGC^ 

HDQMWKCLIRLKPTL 

ATGTGGAAGTGTCTGATTAGGCTCAAGCCTACCCTC 

HepCla #34 

CGIVPAKSVCGPVYCFTPSPVVVGTTDRSG 
TGCGGAATOGTCCCCGCIAAGTCCCTGTGTG GC CCTGT^ 

HepCla #131 

SSLTVTQLLRRLHQWI SSECTTPCSGSWLR 
AGCTCCCTCACAGTGACACAGCItXrrGA 

HepCla #161 

DLSDGSWSTVSSEAGTBDVVCCSMSYSWTG 
GACCTCAGCGATGGCTCCreGTCCACCGTC^ 

HepCla #108 

WDQMWKCLIRLK PTLHGPTPLLYRLGAVQN 
TGGGATCAGATGTGGAAATGCCTCATCAGACTGAAAOC^ 
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L A B Q F 
CTGGCTGAGCAAT 
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KQKALGLLQTASRQABVIAPAVQTN 



HepCIa #118 
W Q K L E 



V P 



WAKHMWNPISGIQYLAGLSTLPG 
\TGTGGAATTTCATTAGCGGAATCCAATACCTCGCCGGACTGTCCAC^ 



HepCIa #129 

I> I A FASRGNHVS 
CTGATTGCCrTTGCCTCCAGGGGAAACCAT 



PTHYVPESDAAARVTAIL 
^CACTATGTGCCTGAGTCCGACXJCTGCCGCTAGGGTCACCXSCTATCCrC 



HepCIa #19 

ATLCSALYVGDLCGSVFLVGQLFTFSPRRH 
GCCACACTGTGTAGCGCn'CTGTATGTGG 



HepCIa #102 
S S V L C 



CAWYELTPABTTVRLRAYM 
TAOGAACTGACACCOGCTGAGACAACCGTCAGGCTCAGGGCTTACATG 



HepCIa #122 

GWVAAQLAAPGAATA PVGAG LAGAA IGSVG 
GGCltM3GTGGCTGC!CCAACTGGCTGCCCCrGGCG^ 

HepCIa #29 

S W H INSTALNCNESLNTGWLAGLFYQHKFN 
AGCTGGCACATTAACTCCACCGCTCTGAATTXKZAAT^ 

HepCIa #164 

NALSNSLLRHHNLVYSTTSRSACQRQKKVT 
AACGCTCTGTCCAACTCCCTGCTOVG<^ 

HepCIa #1 

A A M STNPKPQRKTKRNTNRRPQDVK FPGGG 
GCCG<TTATGTCCACCAATCCCAAACCCCAAAGGAAAAC^ 



HepCIa #106 

SQT KQSGENFP Y 
AGCCAAACCAAACAGTCCGGCGAAAACTTTCtXTTAT 



VAYQATVCARAQAP PPS 
ATCAGGCTACCGTCTGCGCTAGGGCTOUX»CTCCCCC^ 



HepCIa #36 

APTYSWGAHDTDVF 
GCCCXTACCTATAGCTGGGGCGCTAAOGATACCGAT 

HepCIa #156 

VPP PRKKRTVVLTESTLSTALABLATKSFG 
GTGCCTCCCCCTAGGAAAAAGAGAACCGTCGTC 

HepCIa #165 

STTSRSACQRQKKV TFDRLQVLDSHYQDVL 
AGCACAACCTCCAGGTCCGCCTGTO^ 



H P N I E 
&TCCCAATATCGAA 



HepCIa #141 

FHYVTGMTTDNLK 
TTCCATTACGTCACCGGAATGACAACCGAT 



C P C Q V P 



PBFPTBLDGV 
MTTCrTTACCGAACTGGATGGCGTC 



HepCIa #198 

LKLTP IAAAGRLDLSGtfPTAGYSGGDI YHS 
CTGAAACTCACACCCATTGCCGCTGCCGGAAG^^ 



HepCIa #117 

ASRQABVI 
GCCTCCAGGCAAGCCGAAGTGAT 



A P A 



NWQKLBVP 



A K 



H M W N F 

ITATGTGGAACTT7 



HepCIa #181 

CRA SGVLTTSCGNTIiTCY 



IKARAACRAAGL 



Figure 26 (Cont) 



WO 01/090197 



PCT/AU01/00622 



137/216 

HepCla #166 

FDRLQVLDSHYQDVLKBVKAAAS KVKANLL 
TTCGATAGGCTCCAGGTCCTGGATAGCCATra 

HepCla #180 

G TNSRGBMCGYRRCR ASGVLTTSCGNTI, 
GGCCCTCT<^CAAACTCCAGGGGAGAGAATTGCG ACCCTC 

HepCla #136 

1 MH TR CHCGAE I T GHVKNG TMR I VGPRTCR 
ATCATGCACACAAGGTGTCACTtnt3GCGCTGAGA 

HepCla #144 

B V S P RVGLHBVPVGSQLPCBPBPDVAVLTS 
GAGGTOtf5CTTTAGGGTCGGCCTCCACGAATACCC^ 




HepCla #59 

GRDAVILLMCVVHPTLVFDITKLLLAVPGP 
GGCAGAGACCCTGTGATTCTCCICATGTGTGTGGIXX ^ 

HepCla #146 

ML T DPSHITABAAGRRLARGSPPSMASSSA 
ATGCTCACCGATCCCTCCCACATTACCGC^ 

HepCla #78 

SPRPI SYLKGSSGGPLLCPAGHAVG i f r a a 
AGCCCTAGGCCTATCTCCTACCTCAAGGGAAGCTC^ 

HepCla #32 

GACTTTGA ° 6WGPISYAI,GSGPDQRPYCWHYPPKP 
HepCla #128 

R H V G _ P _, G BGAVQWMNRLIAFASRGHHVS PTH 
AGGCATGTGGGACCCGGAGAGGGAGCCGTra^^ 

HepCla #50 

clwmmli.isqabaalbnlvilnaasla'gth 

T^CTCTGGATGATGCTCCTGATTAGCC^ 
HepCla #114 

1 L- P _. p rbvlyrbpdbm bbcsqhlpyibqgmm 

ATCATTCCCGATAGGGAAGTGCTCTACAGAGAGTTTGAC^ 
HepCla #47 

L I H L H Q N I V DVQY LYGVGSS IASWAI KWEY 
CTGATTCACCTCCACOU^CATTGTGG^ 

HepCla #200 

v 8 harprwfwfcllllaagvgiyllphraa 

GTGTCCCACGCTAGGCXrrAGGTGGTTCTGGTTC^ 
HepCla #85 

aatlgfgaymskahgidpnirtgvrtittg 
cccGCTACxxnraxrrm^ 

HepCla #62 

* V Q G L L R ICALARKMIGGHYVQMAI IKLGA 

agggtccagggactgctcaggatttgcgctct^^ 

HepCla #153 

* * - L -- P V * ARPDYNPPLV BTWKKPDYBP 

AGGAGATTCGCTCAGGCTCT£CCK^ 

HepCla #72 

TAAQTFLATCINGVCWTVYHGAGTRTI asp 
ACaSCTGCCCAAA l ll'ril^ 

HepCla #65 
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W A H N G L R D L A V AVBPVVFSQMETKLITWGA 
TGGGCTCACAATGGCCTCAGGGATCTGGCTGTGGCTGTGG 

KepCla #74 

K G P V 1 Q " Y T H V D QDLVGHPAPQGSRSLT PC 
AAGGGACCCGTCATCCAAATGTATACCAATGTGGATCAG 

HepCla #151 

KVVILDSFDPLVABBDERBISVPAEILRKS 
AAGGTOGTGATTCTGGATAGCTTTGACCCTCTGGTCGCOGA^ 

KepCla #64 

L T ° T Y V Y NHLTPLRDWAHNGLRDLAVAVB P 
CTGACAGGCACATACGTCTACAATCACCTCACCCCTCTGAGAG^ 

HepCla #80 

V C TRGVAKAVDPI PVBNLBTTMRS P 
GTGTGTACCAGAGGCGTCGCCAAAGCCGTCGACTT^ 

HepCla #95 

ALGINAVAYYRGLDVSVI PTSGDVVVVATD 
GCCXTTOGGCATTAAOGCTGTGGCTTACTATAGGGGACrcG^ 

HepCla #111 

M S A D L B V V TSTWVLVGGVLAALAAYCLSTG 

ATGTCCGCCGATCTGGAAGTGGTCACXTCCACC^ 

HepCla #97 

ALMTGYTG D P DSVIDCNTCVTQTVDFSLDP 
GCCCTCATGACAGGCTATACCGGAGACTTTGACrcCGTGA 

HepCla #2 

MTNRRPQDVKPPGGGQIVGGVYLLPRRG PR 
AACACAAACAGAAGGCCTO^GGATGTGAAATTCCCTG^ 

HepCla #11 

RALAHGVRVLBDGVNYATGNLPGCSPS I PL 
AGGGCTCTGGCTCACXSGAGTGAGAGTGCTCGAGGATGGCCT 

HepCla #169 

S K P G YGAKDVRCHARKAVAHINSVWKDLLE 
AGCAAATTCGGATACGGAGCCAAAGACGTCAGGTGTCACGCTAGG^ 

HepCla #28 

TPG AKQNIQLINTMGSWHIHSTALNCNESL 
ACCCCTGGCGCTAAGCAAAACATTCAGCT^ 

HepCla #30 

N T G WLAGLPY QHKFNSSGCPERLASCRRLT 
AACACAGGCTGGCTGGCTGGCCTCTTCrATCAGCATA 

HepCla #49 

VVI#LFLLLADARVCSCI*NMMLLI SQABAAL 
HepCla #192 

P C B I Y G A C Y SIEPLDLPPI IQRLHGLSAPS 
GACTGTCAGATTTACGGAGCCTCTTACTCCATCGAACCCCTCG 

HepCla #73 

* T V Y HGAGTRTIASPKG PV1 QMYTHVDQDL 
TCGACAGTGTATCAOGGAGCCGGAACCAGAACCATTCC^ 

HepCla #101 

YRPVAPGERPSGMPDSSVLCBCYDAGCAWY 
TACAGATrCGTCGCCCCTCGCGAAAGGCCTAGCGGA^ 

HepCla #45 
R S B L 



HepCla #195 

LRKLGVPPLRAWRHRARSVRARLLARGGRA 



Figure 26 (Cont) 



WO 01/090197 



PCT/AU01/00622 



139/216 

CTGAGAAAGCTCGGCCTCCCCCCTCTGAGA^ 
HepCla #121 

S PLTTSQTLLFN I LGGWVAAQLAAPGAATA 
AGCCCTCTGACAACCTCCCAGACACTGCTCTTCA^ 

HepCla #61 

LW ILQASLLXVPYFVRVQGLLR I C A L A R K M 
CTGTGGATCXTTCCAGGCTAGCCTCCTGAAAGTGC^ 

HepCla #137 

v k n g t m r i v gprtcrnmwsgtfpinayttg 
gtgaaaaacggaaccatgaggattgtgggacccagaacctgt;^^ 

HepCla #92 

bvalsttgbipfygkaiplbvixggrhlif 
gaggtcgccctcagcacaaccggagagattc^ 

HepCla #188 

ltrdpttplaraawetarhtpvnswlgni I 

CTGACAAGGGATCCCACAACCCCTCTCGCT 
HepCla #140 

RVSAEBYVEIRRVGDFHYVTGMTTDNLKCP 
AGGGTCAGCGCTGAGGAATACGTCGAGATTAGGA^ 

HepCla #155 

P V V H _?^ C __?_ J* PPPRSPPVPPPRKKRTVVLTBS 
CCCGTCGTGCAT 



HepCla #157 

TLSTALABLATKS FGSSSTSGI TGDNTTTS 
ACCCTCAGCACAGCCCTCGCCGAACTWCTO 

HepCla #135 

V S C QRGYKGVMRGDGIMHTRCHCGAB ITGH 
GTGTCXrTGCCAAAGGGGATACAAAGGCGTCltMA 

HepCla #20 

V P L V G Q L F T FSPRRHWTTQGCNCS I YPGHI 
GTGTTTCTGGTCGGCCAACTGTTTACCTTTAG 

HepCla #123 

FVGAGLAGAAIGSVGLGKVLVDILAGYGAG 
TTCGTCGGCGCTGGCCTCGCCGGAGCCGCT 

HepCla #133 

° 1 " D " 1 C BVLSDFKTWLKAKLM PQLPGZ PF 
GACATTTGGGATTGGATTTGCGAAGTGCTCAGCGATTTCAAAA 

HepCla #15 

* S S I V Y BAADAILHTPGCVPCVREGNASRC 
AACTCOVGCATl'GTGTATGAGGCTGCCGATGCC^T^ 

HepCla #31 

S S G C P B RLASCRRLTDFDQGWG PI SYANGS 

AGCTCCGGCTGTCCCGAAAGGCTC5GCCTCCTGCAG^ 

HepCla #178 

R T B B A I Y 0 C C D L D P Q A R V A I XS I*TBRLYVG 

AGGACAGAGGAAGCCATTTACCAATGCTGTGACCTCGACCCIt^ 

HepCla #69 

VSKGWRLLAPITAYAQQTRGLLGCI ITSLT 
GTGTCCAAGGGATGGAGACTGCTCGCCCCTA 

HepCla #191 

PP _ S VLIARDQLBQALDCBIYGACYS IEPLD 
TTCTTTAGCGTCCTGAT7XXX3^GAGA 

HepCla #142 

CQVPSPBPFTELDGVRLHRFAPPCKPLLRB 
TGCCAAGTGCCTAGCCCTGACTTTTTCACA^ 
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HepCla #182 

TCYIKARAACRAAGLQDCTMLVCGDDLVVI 
ACCTGTi a<^TTAAGGCTAGGGCTGCCTGTAGGGCTGCCGGA 

HepCla #86 

IDPNI RTGVRTITTGSPITYSTYGKPLADG 
ATCGATCCCAATATCAGAACC^GAGTGAGAACCATTACOVCAGGCTCCCCCATTA 

HepCla #44 

CNWTRGBRCDLEDRDRSELS PLLLSTTQWQ 
TGCAATTGGAO^GGGGAGAGAGATGCGATCTGGAAGACAGAG 

HepCla #22 

TG HRMAWDMHMNWS PTAALVMAQLLRI PQA 
ACCGGACAO^GAATCGCTTGGGATATGATC^ 

HepCla #127 

PGALVVGVVCAAI LRRHVGPGBGAVQWMNR 
CXXGGAGCCXTCGTGGTCGGCGTOrrGTGTGCCGCT 

HepCla #149 

HDSPDAELIEANLLWRQEMGGNITRVESBN 
CACGATAGCXTTGACGCTGAGCTCATCGAAXXXM 

HepCla #105 

B G V P TGLTHIDAHFLSQTKQSGENFPYLVA 
GAGGGAGTGTTTACCGGACTrGACACACATTGACXOT 

HepCla #S 

RGRRQPI PKARRPEGRTWAQPGYPWPLYGN 
AGGGGAAGGAGACAGCCTATCCCTAAGGCTAGGAGACCCGAA 

HepCla #173 

LIV FPDLGVRVCEKMALYDVVS KLPLAVHG 
CTGAriXntSTrmXt^TCTGGGAGTGAG 

HepCla #12 

YATGNLPGCSFSI F I» L A I> L S CLTVPASAYQ 
TACGCTACCGGAAACCrCC a J GG ATGCTCC^^ 

I M S G E V 
ATCATGAGCGGAGAGGTC 

HepCla #160 

SYSSMPPLBGBPGDPDLSDGSWSTVSSEAG 
AGCTATAGCTCCATGCCTCCCCTCGAGGGAG^ 

HepCla #150 

RQBMGGN ITRVESENKVVILDS FD PLVAEB 
AGGCAAGAGATGGGCGGAAACATTACCAGAGTGGAAAGCGAAA^ 

HepCla #75 

V G W PAPQGSRSLT PCTCGSSDLY LVTRHAD 
GTGGGATCGOCTGCOCCTCAQQGAAGCAGAAGCCTCftCCCCTTGC^ 

HepCla #88 

G C S G G AY D I I I C D BCHSTDATS I LGIGTVL 
GGCItgTAGCGGAGGCGCTTACGATATCATTATCTGTGAC^ 

HepCla #99 

T F TIBTTTLPQDAVSRTQRRGRTGRGKPGI 
ACCTTTACCATTGAGACAACCACACTGCC1CAGGATGCCGTCA 

HepCla #40 

DCPRKHPBATYSRCGSGPHITPRCLVDYPY 
GA CWrrn^ GAAAGCATCCCGAAGCCACATACTCCM 

HepCla #201 

LAAGVGIYLLPMRAA 
CTGGCTGCCGGACTGGGAATCTATCTGCTCCCCAATAGGGCTGCC 
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HepCla #163 

ALVTPCAABEQKLPINALSNSLLRHHNLVY 
GCCCTCGTGACACarrGTGCCGCT^ 

HepCla #132 

ISSBCTTPCSGSWLRDIWDWICEVLSDFKT 
ATCTCCAGCGAATGCACAACCCCTTGCTCCGGCTCC^ 

HepCla #134 

WLKAKLMPQLPGI P P VSCQRGYKGVWRGDG 
TGGCTCAAGGCTAAGCTC^TGCCrcAGCTCC 

HepCla #41 

SGPWITPRCLVDYPYRLWHYPCTINYTI FK 
AGCGGACCCTGGATCACACCCAGATGCCTCGTGGATTACrt^ 

Artificial Protein: 



VI PVRRRGBSRGSI^PRPISYI^GSSGGPAJttGREILIjGPADGMVSK^^ 

KlilTWGADTAACGDI INGLPVS LLC PAGHAVG I FRAAVCTRGVAXAVDF I PVCWI VGRIVLSGKPAI I PDRBVLYREFDEMPCTPLPAPNYTFALWR 

VSABEYVEI RRVGDALYDWSKLPLAVMGSSYGFQYS PGQRVEFI SVKXWWLQYFLTRVBAQLHVWVPPLNVRGENLVI LNAAS LAGTHGLVS FLVFF 

CFAWYLLPPIIQRLHGI^APSUISYSJCTINRVAACNPPLVBTO 

NTRPPIXaWFGCTTraNSTGFTKVCGAPPFTEAHTOYSA 

ITSLTGRDKNQVEGBVQIVSSSPPAVPOSFQVAHLHAPTGSGKSTKVPA 

AGRTTSGLVSI»IiEVTLTHPVT!CYTMTCMS^ 

LDLS IAYFSMVGNWAKVLVVLLLFAGVDAETHVniliARGS P PSMASSSASQLSAPSLKATCTANGLVS FLVPFCPAVTYLKGRWVPGA VYALYGMQL PC 
EPEPDVAVLTSMLTDPSH ITABAAGRDSVTPI DTTI MAKNBWCVQPEKGGRK PAR YAAQ^^ 

DC PNSS I VYEAADAI UfTSSYGFQYSPGQRVBFLVQAWKSKKTPMGFSDTAACGDI INGLPVSARRGRE I LLG PATX5MSQLSAPS LKATCTANHDS PD 
AELI RAHLLWHPAXASU4AFTAAVTSPLTTSQTLLFNI I/3LVQAWXSKKTPMGFS YDTRCFDSTVTE SD I DERE I SVPAE I LRKS RRFAQAL PVWAR P 
DYMPAPTIMARMIIMTHFFSVIJARDOLSOA^ 

TTLPALSTGLI HLHQNI VDVQ YLYKGR WVPGAVYALYGMW PT ■! .T .LT .LALPQRAYS P I TYSTYGKFLADGGCSGGAYD 1 1 1 CDECARSVRARLLARGGR 
AAICGKYI*FNWAVRTKXAVAHINSVWKDLLEDS^ 

VATRDGKLQDCTMLVCGDDLVVI CESAGVQET1AASLRAVAGALVAFKI MSGEVPSTEDLVNLLPAI LS YDTRCFDSTVTESDI RTEEAI YQCCDLDPQ 
ELTPAETTVRIjRAYMNTPGLPVCQDHLEFW PQPEYDLELI TSCS SNVSVAHDGAGIOIVYYLGKV IDTLTCGFADLMG Y I PLVGAPLGGAAAI PLEVI K 
GGRHLI FCHSKKKCDBIJUVKLVGGVLAALAAYCLSTGCVVTVGRIVLSGKPA 
SHARPRWFWFCI*IJ>SSSTSGITGDNTTTSSEPAPSGCP 

DLEDRDEAQLHVWPPLWVRGGRDAVI LLMCVVHPTI^^ PKARRPEQWSVAHDGAGKRVYYLTRDPTTPLARAAWESE 
PAPSGCPPDSDABSYSSMPPLEGBPGDPIGGHYVQMAI I KLGALTGTYVYNHLTPLRD PSTBDLVNLLPA I LS PGALWGWCAAI LRILDMIAGAHW 
GVLAGIAYFSMVGHWAKVLVEGCGWAGWLLSPR I TGHRMAWDMMMNWS PWVAMTPTVATRDGKLPAT 

QUWHIDIXVGSRLWHYPCTINYTIFKVRMYVGGVEH^ 

DGVNGGNAGRTTSGLVS LLT PGAKQNI QLINTNGLALLS CLTVPASAYQVRNSTGLYHVTNDCPGRD KNQ VEGBVQI VSTAAQTFTiATCI NGVCP ATQ 
IJIRH I DLLVGSATLCSALYVGDLCGSHAPTGS^ PRGSTEDVVCCSMSYS 
WTGALVTPCAAEEQKLP IALDTEVAASCGGVVLVGLMALTLS P YYKR YWMNSTGFTK VCGAP PCVI GGAGNNTLHCPTSVBEACS LTP PH SAKS KFG Y 
GAIO)VRCHARISGIQYLAGLSTI*PGNPAIASI>lAFTAAVTQI PGB INR VAACLRKLGVP PLRAWR 

HRTARHTPVNSVnit^I I MFAPTLWARMI LMTHENI*ETTMRS FVFTDNSSPPAVPQSFQVAHIATPPGSVTVPHPNIREYALSTTGBI PFYGKLVFDXT 
KI,I^VFGPLWILQASIJJCVPYFVTAAL\7MA^ 

NEVTLTH PVTKYIMTCARVAI KSLTBRLYVGGPLTNSRGENCGYRRCVIGGAGIOITLHCPTDCP PVRRR 

GDSRGSLLN>WSGTFTXNAYTTGPCTPI*PAPNYTFALWHSTDA 

RPSWPTDPRRRSRNLGXVIDTLTCGFADLGPDQRPYCWHY^ 

AQAPPPSWDQMWCI*IRlJCPTLrciVPAKSVCGPVYCFTPSP 

BDVV(XrSMSYSWrcWDQMI«CXIRUCir^^ 

LSTLPGLI AFAS RGNHVS PTHYVPBSDAAAR VTAI LATLCSAL YVGDLCGSVFLVGQLFTFS PRRHSSVLCECYDAGCAWYBLTPAETTVRLRAYMGW 

VAAQLAAPGAATAFVGAGIJtfaAAIGSVGSimiNSTALNCNES 

RKTKROTTO&PQDVKFPGGGSQrTKQSGSHFP^ 

TAIABIJVTKSroSTTSRSACQRQKXVTFDRLQVLDSHY 

UXIVIJU-TPIAAAGRIjDLSGWFTAGYSGGD I YHSASRQAEVI A P I KARAACRAAGLFDRL 

QVU)SHYQDVI^EVKAAASKVKA1ILI^PLTNS^ 

VGSQIJ<!E PB PDVA VLTSKEVXAAAS KVKANIXSVEEACSLTP PHSAXGRDAV I LLMCWH PTLVFD I TICLLLAVFG PMLTD PSH I T AEAAGRRLARG 
SPPSMASSSASPRPISYUa^SGGPIOiCPAGHAVGIFRAADFDQGWP 

THCLWMKLLI SQAEAALBNLVI LNAASLAGTH I 1 PDRBVLYRBFDEMEECSQHLPY I EQGMML IHLHQNI VD VQYLYGVGS S IASWA I KWEYVSHARP 
RWFWFCLLLIJ\AGVGIYLLPWRAAAATLGPGAY RTGVRTITTGRVC^I*IJIIOUJUIKMIGGHYVOMAI I KLGARRFAQALPVWARPD 

YN PPLV^TWKKPDYEPTAAQTFIATCINGVCVTVYTIGAGTRTIAS PWA>INGLRDLA VAVEPWFSQMETKLI TWGAKG PVI QMYTNVDQDLVGW PA PQ 
GSRSLTPCKWI1J)SFDPLVAEEDBREISVPAEILRKSLTGTYVYNHLTP PVENLETTMRSPVFTDN 
ALGINAVAYYRGLDV5VI PTSGDVVVVATDMSADI*BVVTSTWVLVGGV1A^^ 

VKFPGGGQI VGGVYLIJ^RG PRRAIAHGVRVLEDGVNY^ X PLSKFGYGAKDVRCHARJCAVAH I NSVWXDIiLET PGAKQN I QLINTNGS 

WH I N STALNCNBS LKTGWLAGI*PYQHXFNS SGCPBRLASCRRLT^ SOAEAALDCE IYGACYS I E PLDL P P 1 1 QRLH 

GLSAFSVTVYHGAGTRTIASPJCGPVIQHYT 

KLGVPPIJiAWRHR&RSVRARLLARGGRAS PLTTSQ7LLFN I IXJGWVAAQ LAAPGAATALW I LQASLLKVP YFVRVQGLLR I CALARKMVKNGTKR I VG 
PRTCRHMWSGTFPINAYTTGEVALSTTGEI PFYGXAI PLEVI KGGRHLI FLTRDPTTPLARAAWBTARHTPVNSWLGNI I R VSAEEYVE I RRVGDFHY 
VTGMTTDNLKCP PWHGCPLP PPRSPPVPPPRXIOnVVLTBSTLSTALAELATKS FGS SSTSGI TGDNTTTSVSCORGYJCGVWRGDG I MHTRCHCGAE 
ITGHVFLVGQLFTFSPRRHWTTQGCNCSIYPGHIFV^ 

VYEAADAI LHTPGCVPCVRBGHASRCSSGCPERLASCRRLTOFDQG KSLTERLYVGVSKGWRLLAPIT 
AYAQQTRGLLGCIITSLTFFSVI*IARDQLEQALDCEIY^ 
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LVCOTDLWIIDI^IRTGVin-ITTGSPITY^ 

IPKARRPEGRTWAQPGYPWPLYGNLIWPDLGVRVCBKMALYDWSia>PIAVM^ 
C^/AGALVAFKIMSGEVSYSSMPPI^BPGD^ 

YLVTRHADGCSGGA YDI 1 1 CDECHSTDATS I LG I GTVLTFT I ETTTLPQD AVSRTQRRGRTGRG KPG I DCFRKH PEATYSRCGSG PWI TPRCLVD Y P Y 

IJU^GVGlYLLPNRAAAIiVTPCAAEEQXL^ 

GVWRGDGSGPW I TPRCLVD Y PYRLWHYPCTINYT I FK 

Artificial DMA: 



GTGATTCCCGTCAGGAGAAGGGGAGACTCCAGGGGAAGCCTC 
GGGAAGGGAAATCCTCC^^ 

AAGCTCATC^^ 

CTTTAGGCOTWOCTCTGCACAAGGGGAGT^^ 

CTGCCATTATCCCTGACAGAGA 

GTGTCCGCCGAAGAGTATtnTSGAAATCAGW 

CTTTCA<n'ATAGCCCTGGCCAAAGGGTCGAGTT^ 

QG GTGCC TCCCCTCAACGTCAGGGGAGAGAATCTGGTC 

TGCTTTGCCTGGTACCTCCTGCCTCC^ 

CTCGCGTCGGCTCCAGCATTGCCTCCTGG^ 

AACACAAGGCCTCCCCTtXXX^ 

GACAAGGTATAGCGCTCCtX!CTGGCGATtrCCCCT^ 

TCGCCCTCCCCCAAACGGCTTAOGCTCTGGATACCGAAGTCGCTC 

ATQR^AGCCTCACCGGAAGGGATAAGAATCAGGTCGA^ 

AAGGCGTCTTCACAGGCCTCACCCATAT^ 



GGTCGTOIOWVGCRCATGGGTCCTGGT^ 

AATACTTTCTGACAAGGGTCGCCATTTGCGGAAAGT^ 

CTGGATCTGTCCATCXXrrTACTTTAGC^ 

GACAAGGCTCX5CCAGAGGCTCCCCCXXT 

TCGTCTCCTTCCT^ 

GAGCCTGAGCCTGACGTCGCCGTCCTGACAAGCAT^ 

CACAACCATTATGGCTAAGAATOtf^^ 

TCCTGAATCCCTCCgTGGCTGCCACAC^ 

GACTOTCCCAATAGCTCC ATOGT CTAC^ 

CCTO^TGCAAG CCTGGAA GTCCAAGAAAACC^^ 

GWaa^GAGJ^TTCTGCTCGG^^ 

GCCG AACTCAT TGAGGCTAACC^^ 

CCTCCTGTTTJUVCATTCTGGGAC^ 

CAGAGTC CGACAT TGACGA AAGGGAAA TCTCC^^ 

GACTATAI^iii\^^c^x>»CCC^^ 

C^ruATCwriiCCTCCGGOC^TCTGGlW 

AAAAGAAATGC^ 

ACCACACTXXXrTGCCCTCAGCACAGGCCTCATC^ 

GTATGCCCTCTOCGGAATCTGGCCCCT^^ 

TCGCCGATGGCGGATGCTKXXjGCGGAGCCT 

GCCGCTATCTGTGGCAAATACCTCTTCAATTGGGC^^ 

CGTCACCCCTATCGATACCACAATCATGGCCAAAAACGAA 

CCTGGG(»GCCAATGACACAGACGTCTTCGTCOT 

GTGGCTACCAGAGACGGAAAGCTCCAGGATTGCACAATGCTCGTGTGTC 

C^AGCCTCAGGGCTCrrG GCTGGCGC rc^ 

TTCTGTCCTACGATACCAGATGX^^ 

GaGCTCACCCCTGCCGAAACCACAGTGAGACTGAGA 

CGAATACGATCTGGA ACTGATrACCTCCTG CTCO^ 

ATACCCTa^ 

CCTCAGCACAGGCTGTGTGGTCATCGTCGGC^ 

GAGCCTTTACCtSAAGCCATGACCAGATACTCCGCCCC^ 

AGCCATGCCAGACCCAGAJU,frilWriTriAX-rCV 

CCCTAGCGO^TGCCtTCCCGATAGCGAT^^ 

GAGAGAGACCCTCCGGCATGTTCX^TGTCAGAATGTATGTGGGA^ 

GACCTCGAGGATAGGGATGAGGCTCAGCTCC^^ 

CCAT OCCA ^A CJGGGAGTG AGAGCCACAAGGAAAACCTCOGAG 

GAAACCrtOiGCGreGCeCATG^^ 

C GTCC Ai»TGGCCATTATCAAACTGCGA<XX!CTCAOT 

ATCTGCTCCCCGCTATOTCAGC^^ 

GGC^mXTGGCTGGCATTGCTOTTTCTCC^ 

CftGGCCATACGATGGOCTta3GACATGATGATGAAC^ 
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CAGCTCAGGAGACACATTGACCTC CTCGTOGGCTCC AGGCICTGGCACT 

CGGAGTGGAACACS^CTGGAAGCCGCTC^ 

TCAGGGTCTGCGAAAAGATGATGGGATACATT^^ 

GACGGAGTGAATGGCXSGAAACtXTUGt^^ 

OGGACTGGCTL"IXXrrCAGCrGTCTGACAGT^ 

GAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATOGTCAGC^C^ 

CTGAGAAGGCATATCGATCTCCTCCTGGGAAGCGCT 

CAAAAGCACAAAGGTCCCCGCTGCCTATGCrGCTCAGGGATACAAAGT^ 

GGCCCCTCTACGGAAA^^ 

TGGACAGGCGCTCTGGTCACX?CCTTGCGCIt5^ 

GGTCGGCCTCATGGCTCTGACACTGT^ 

TTGGCGGAGCCGGAAACAATACCtrrCra 

GG CGCT AAGGATGTGAGATGCCATGCCAGAATCTCCGG<^ 

GGCTTTCAaUKXXXrTGTGACACAGATTGTGGGAGGCGT^ 

AAAGGTTCCAGCCTCTGCMAGCTATAC^ 

CACAGAACCGCTAGGCATACCCCTGTGAATAGCTGGCTGGGAAACAT^^ 

GAATCTGGAAACCACAATGAGAAGCCCTGTGTTTACCGATAAC^^ 

CT(a3CTCCGTGACAGTGCCTCACCCTAACATT^ 

AAGCTCCTGCTCGCC^Vri^GGACCCCTCTGGATT^^ 

GCTCAGGATTCCCCAAGCCATTCTGGATATGATTGCCGGAGCCCAT'^^ 

CCCTGGATCCCACATTCACAATCGAAACX!ACAACCCrCCCCCAAGAC^ 

AAOSAAGTGACaCTGACaCACCCTGTCACAAAGTATA 

CCTCACCAATAGCAGAGGCGAAAACTCrroGCTATAGGAGATGCXr^ 

AACACXXTrGAGGCTACCTATAGCAGATGC^ 

G6CGATAGCAGAGGCTCCCTGCTCAACAT6TGGTCOGGCACA 

CACATTOGCTCreTGGCACTCCACOGATGCCACAAGCATTCT^ 

TOXX3^TACGTCCCCGAAAGCGATCra 

AGGCCTA GCTG G GGCCCTA CCGATCCCAGAAGGAGAAGCAGA 

CCAAAGGCCTTACTGTTGGCATTACCCTCCCAAArc 

ATCTCCCTTACATTGAGCAAGGCATGATGCTC^ 

GCCCAAGCCCCT CCXXX n'AGCTGGGACCAAATGTGGAAGT^ 

rcCTGTCTATTGCTTTACCCCTAGCCCT 

AATGGATTAGCTCCGAGTGTACCACACCCTGTA<X?GGAA 

GAGGATGTGGTCTGCTGTAGCATGAGCTATAGCTGGACCGGA 

CCCTCTCCTCTACAGACTGGGAGCCGTCCAGAATCTC 

TCATCGCTCCCtXTGTGCAAACCAATrGGCAAAAGCTOGAGGTC 

CTGTO^CCCTCCCOGGACTGATTG 

CXXTTATLVltJl^TJACACTCltiTAW 

ATAGCTCCGTGCTCItJCGAATGCIATGACGCTO^^ 

GTGGCrGCCCAACTGGCTG^tXr-roGCGCTGC^ 

CTCCaCCGCTCTGAATT(XrAATGAGTCCCTGAATAC^ 

TCAGGCATCACAATCIGGTCTACTCCACCACAAGCA 

AGGAAAACCAAAAGGAATACCAATACGAGACCCC^ 

TCAACAATACCAGACCCXXr^^ 

ACCGCTCTGGCTGAGCTCGCCACAAAGTCL^^^ 

GCT0GACTCCCACTATCAGGAT6TGCTCC»UX3UWX^ 

TCCCCCATCCCAATATCGAATAIJCATTACGTC^ 

CTGGATGGOTItXTGAAACTGACACCC^^ 

CTCCGCCTCCAGGCAAGCCGAAGTGATTGCC^^ 

CAGGTCCTGGATAGCCATTACCAAGACGTCCrGAAAG 

GGGAGAGAATTGCGGATACAGAAGGTGTAGGGCTAGCGGAGTGCTCA 

CTGAGATTACCGGACA(XTCAAG 

GTGGGAAGCCAACItXtrritXXjAACCCGAACCCX^Tg 

GTCCGTGGAAGACGCritXTCtX"IX>ACACCCCCTCACTCOGC^ 

TTGACATTACCAAACTGCTCCfGGCrGl^^ 

AGCCCTCCCrCCATGGCTAGCTCCA^ 

C^O^ri"rrtAGA«X^CTCACT TTCAC ^^ 

ATCCCCCTAAGCXTAGGCATGTGGGACCCGGAGAGGGAGCC^ 

ACCC ATTG CCTCTGGATGA TGCTC CTGATTA^ 

CATTCXXGATAGGGAAinGCTCTACAGAGACTTTGAOGAAATGGA 

TCCACCAAAACATTGTGGATGTGCAATACCTCTACGGAGTG 

AGGTCGTTCTGOTCTCTCT^^ 

CATGAGCAAAGCCCATGGCATTGACCCTAACATTAGGACAGGOGTC^ 

CTAGGAAAATGATTGGOGGACACTATGTGCAAATGGCTATCATTAAGCT 

TACAATaXXrTCTGGTCGAGACATGGAAAAACCCTGACTA 

G6TCTACCATG6C6CTGGCACAAGGACAATCGCTAGCOCTTGGGCTC 

AAATCGAAACCAAACTGATTACCTGGGGCGCTrAA 

CXXntXAGGTCXX^tSACACCCTGTAAGGTCGTG^ 

GATTCroAGAAAGTCCCTGACAGGOVCATACGTCT 

TCGAGCXrTCTUIXrrACCAGAGGCGTCGCCAAAGCCGTCGA^ 

GCCCTOGGCATTAACGCTGTGGCTIACTATAGGGGACTGGATGT^ 
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CGATCTGGAA GTGGT CACCTCCACCTGGGTGCTCGTGG^ 

ATACCGGAGACTT^CTCCGTGATTGACTGT 

GTGAAATTCCCTGGCGGAGGCCAAATCGTCGGCGGAGTC 

GGATGGCGTCAACTATGCCACAGGCAATCTGCC1X5GCTCT 

CTAGGAAAWXXTOXTCATATC^ 

TGGCATATCAATAGCACAGCCCTCAACTGTAACGAAAG^ 

CCCTGAGAGACreGCTA(XrrGTAGGAGACTGACAG^ 

TCATCTCCCAGGCrGAGGCTGCCCTCGACTGT^ 

CCAAGACCTCTAOVGATTCGTCGCCCCTG^ 

ATAGGTCCGAGCTCAGCCCrCTGCrCCT^TCCACCA 

AAGCIXXXJCGTCCCCXXTCTGAGAGCCTGGAGGCATAGGG 

CTCCCA^ACACTGCTCTTCAATATCCTOGG 

TCCIt;AAAGTGCCTTACTTTGTGA 

CCCAGAACCTGTAGGAATATGTGGAGCGGAACCTTTCCCATTAAC^ 
OSGAAAGGCTATCCXrrCTGGAAGTt^^ 

CAGCCAGACACACACCCGTCAACTCCTGGCTCGGCAATATCATTA 

GTGACAGGCATGACCACAGACAATCTGAAATGCCXrrCC^ 

GAAAAGGACAGTGGTCCTGACAGftCTCCAC^^ 

GAGACAATACCACAACCTCXTGTGTCCTGCC^^ 

A TCACAGG Ct^TGTGTTTCTGGTCGGCCAACTGTTTO 

a^TTTTCGTCGGCGCTGGCC^ 

TWTGGGATTGGATTTGCGAAGTGCTC 

GTG^TGAGGCTGCCGATGCCATTCTGCATACCCCTGG 

<XHXXJCCTCCTGCACAA€GCTCACCGATTTCGA 

GTGACCTCGACCCTC»GCOTAGGG^^ 

GCCTATGCCXIAACAGACAAGGGGACTGCTCGGCTGTATCATTACCT^ 

GGATTGCGAAATCTATGGOGCTTGCTATAGCATTGAGCCTCTC 

ATAGGT^TCCCCCTCCCTGTAAGCCTCTGCTCAGGGAA 

CTGGTCTGCGGAGACGATCTGCTCGTGATTATCGATCC^ 

CGGAAAGTTTCTGGCTGACGGATGCAATTGGACAA^ 

CAACCCAATGGCAAACCGGACACAGAATGGCTTGGGAT^ 

CGATAGCCCTGACGCTGAGCTCATCGAAGCCAATC^ 

TTACCGGACTGACACACATT(^OGCTCACTTTCTGT^ 

ATCtXTAAGGCrAGGAGACCCGAAGGCAGAACCTGGGCCCA 

GAGAGTGTGTGAGAAAATGGCICTGTATGAC5GTCGTGT^ 

CCATCTTTCTGCTCGCCCTCCra 

GGCGTCGCCGGAGCCCTCGTGGCTrrCAAAATCATGAGC^^ 

GTCCGACGGAAGCTiaMCACAGTGTC^^ 

TCGACTCCTTCGATCCCCTCX^l'GGCTGA 

TACCTCGTGACAAGGCIATGCCGATGGCTGTAGCGGAGGCGCrTACGAT^ 
CATTGGCACAGTGCTC^^ 

CTGGCATTGACTCTTTCAGAAAGCATCCCGAAGCCAC^ 
O^XT<XXXSGAGTC^X^TCTAT^^ 

CAGCAATACCCTCCTGAGACACCATAACCTOGTGTATATCTC^ 
TCTGTGACGTCCTGTCOGACTTTAAGACATGGCTO 
GGAGTGTGGAGGGGAGACGGAAGCGCACCCTGGATC^CACC^ 
TACCATTTTCAAA 



HepC Savine Cassette Sequences <A+B+C) with specific restriction sites removed which can be joined 
to generate a single expressible open reading frame that encodes the hepc Savine protein above 



Cassette A 

TATCTGAAAGGCTCCAGCGGAGGCCCT 

AGGCTGGAGGCTCCTGGCTOC^^ 

AGGAAGlXSl^CntJUSAGTCGGACTGCATGA^ 

ACATGGGGAGCCGATACCCXrreCCTGTGGCGA 

TGTGGGAATCTTTAGGGCrGCCGTCTCCACAAGGGGAGTGGC^ 

TGGGAAGGATTGTGCTCAGCGGAAAGCCTG^^ 

TCTA C CCCTCTGCCnXXXV^^ 

CGGCGATGCCCTCTAOGATtntXJItlAGCAAACI GC CTC^ 

GCCAAAGGGTCGAGTTTATCTCCTGGTGTCTCTt^ 

TGGGTCCCTCCani A ACGTCA^ 

CftG Ci i iiiuavnvrrriticrrnjc c^^ 

GCCTCCACrtXrACTCCCCCGGAGAGAT 
GATTACGAACCCCClXrit^ltX A CGGATCCCCTCIt j^ 
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CTGGGCTATCAAATGGGAATACGTCCT 

GGCCTCCCCTCGGCAATTGGTTTGGC^ 

ACAGAGCXTTATGACAAGGTATAGCGCTCCCCCTC 

TAGCrCCTGGCCTCTGCTCCTGCTCCTGCTOGCCCTCCCCC^ 

GCGGAGGCGTCGTCXTTCCAGCAAACCA^ 

(nXXJAGGGAGAGGTCCAGATTGTGTCCAGCTCCCCCCCTC 

TACCGGAAGCGGAAAGTCCACCAAA^rGCCTGCCGCTAACA 

GGGAAGGCGTCTTCACAGGCCTCACXXATATCGATGCCCAT^ 

ACACACGTCACCGGAGGCAATGCCGGAAGGACAACCTCCGGCCT 

CACCAAATACATTATGACATGCATGAGCGCTGACCTCGAGG 

CCCTCACCCTCAGCCCTTACTATAAGAGATACATTAGC^ 

ATTTGCGGAAAGTATCIGTTTAACTGGGCCGTCAGGACAAAGCTC 

TCTGTCCATCGCTTACTTTAGCATGGTOGG^ 

CCGAAACCCATGTGACAAGGCTCGCCAGAGGCTCCCC^ 

TCCCTGAAAGCCAC^TGCACA<X:CAATGGCCrCgrgI^ 

ATGGGTCCCCX3GAGCCSTCTACGCTClXn , ATGGC^ 

GCATGCTGACAGACCCTAGCCATATCACAGCC^^ 

CCTGGTCCTGAATCCCTCCGTGOT 

CCGGACTGTATCA<OTCACCAATGACTGTCCCAATAGCTCC^ 
TCCTACGGATTCCAATACTCCCCOXJACAG^ 

GCCCTGCCGATGGCATGAGCCAACTGTCCGCC^ 

GAACTGATTGAGGCTAACCTCCTGTGGAACCCTGC^ 

CACCACAAGCCAAACCCTCCTGTTTAACATTCT^ 

GCTATGACACAAGGTGTTTCGATAGCACAGTGACAGAGTCt^ 

CTCAGGAAAAGCAGAAGGTTTGCCCAAGCC^^ 

TAGGATGATCCTCATGACACACTTTTTCTCCGTGCTC^ 

CCTCCGGCGATGTGGTCGTGGTCGCCACAC^ 

CATAGCAAAAAGAAATGCX^TGAGCTCGCCGCTAA 

CGACXJTCGTGCTCCCCTGTAGCTTTACCACACTGCC^ 

AtGTCCAGTATCTGTATAAGGGAAGGTGGGTGCCTGGCGCT^ 

CTCCTCGCTCTCCCTCAGAGAGCCTAra 

CCGCTATCnntgGCAAATACCTlTTCAATTG^ 

GATCTGCTCGAGGATAGCGTCACXXXrrATCGATACCA^ 

CGGCACAACCGATAGGTCCGGCGCTCCCACATACTCCT 

CCTGTGTGAGAGAGGGAAAOX?TACCAGATGC^ 

<^TTGCACAATT3CTCGTGTGTT3GCGATGACCTOT 

CAGGGCTGTGGCTGGOGCTCTGGTCGCCTTTAAGATTATGT^^ 

TGCCTGCCATTCTGTCCTACGATACCAGATGCT^^ 

TAT CAGTCTTCXXStT CT 

T(XtCCTCCX?CGTCTGCCAAGACCATCrG 

GCAA1X7TGTCCGTGGCTCACGATGGCGCTGGCAA 

TTTGCCGATCTGATGGGCTATATCCCTCTGGTCGGCGCrC^ 

AGGCGGAAGGCATCTGATTTTCTGTCACT 

CCGCTCreGCTGCCTATTGCCTCAGCA^ 

GAAACCGCTGG0GTCCAG6AAGACGCnX!CTCCCTGAGAG^ 

AGACCCTGGCTGGTTCACAGCXXX^TACrCCGGCGGAGA 

GGTTTTGCCTCCTGCTCAGCTCCAGCACAACCG 

GGATGCCCTCCCGATAGCGATGCCGAAAGGACACAGAGAACG^ 

TGTGGCTCCCGGAGAGAGACCCTCCGGCATGTTCGATGTGAGAA 

CCTGTAACTGGACCAGAGGCXSAAAGGTGTGACCTCGAGGAT^^ 

AATGTGAGAGGCGGAAGGGATGCCGTCATCCTCCTGATGTG 

AACCTCCC3AGAGAAGCCAACXX»GAGGCAGAAGGCAA 

CCCATGAOGGAGCOGGAAAGJUSAGTGTATTACCTCACCAGft 

GAACCCGCTCCCTCCGGCTGTCCCCCTGACTCC^^ 

AGACCCTATOSGAGGCCATTACGTCCAGATGGCCA^ 

TGACACCCCTCAC^GAcCCCTCCACCGAAGACCTCGTGAA 

GGAGTGGTCTGCGCTGCCATTCrGAGA^ 

TTTCTCCATGGTCGGCAATTGGGCTAAGGTC^ 

ATgtcgactgagaattcgcc 



Cassette B 



ggcggat ccaccatgc tcgagTCGACAACCCAAOXrrGTAACT^^ 

GGCCTC«GACATGATGATGJ^^ 

CTtCACACCTraGGAGACaCATT^^ 

TTTAAGGTCAGGATGTAOGTCGGCGGA<ntSGAACACAGACT 1 1 1 GCG T CCA GCCTGAGAAAGGCGG 

AAGGAAACCCGCTAG(X"ltJAitJb-rCl-lXX^ 

TCGGAGCCCCTCTGGGAGGOGCTGCCACAGCCCTOGCC^ 
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GCTGGCAGAACCACAAGCGGACTC^TCAGCCTCCT^ 

ACTGGCKH13CTCAGCTGTCTGACAGTGCCTGC 

ACGATTGCCCTGGCAGAGACAAAAACCA 

ACATGCRTTAAiCGGAGTGTCTCCXSGCTACCCAACTC 

CXKXXnXTTACGTCXSGCGATCTGTGTGGCXCC^ 

CCGCTCAGGGATACAAAGTGCTCGTGCTCAACCCTAGCGTC^ 

GGAAAOGAAGGCTGTGGCTGGGCOGGATGGCTCCTGTCCCC^ 

CTACTCCTGGACAGGCGCTCTGGTCACCCCTTGC^ 

CCGCTAGCTGTGGCGGAGTGGTCCTGGTCGGCCTt^ 

TCCACCGGATTCACAAAGGTCTtX^GAGCCCCTCC^^ 

AAGCGTCGAGGAAGCC1CTAGCCTCACCCCTCCCCA 

GCCATGCCAGAATCrCCt^CATTCAGTATCTGGCTGGCCTC^ 

GCTTTOkCAGCOGCTGTGACACAGATTGTCGGA 

TACCAGAAAGACAAGCGAAAGGTCCCAGCCTCTGCATA 

ATTATCATGTTCGCTCCCACACTGTGGGCCAGAATGATTC^ 

TGTGTTTACCGATAACrCCAGCCCTCCCGCTGTGCCTC^ 

TGA£AgIGCCTCACCCTAACATTGAGGAft 

GATATCACAAAGCTCCTGCTCGCCG^ 

CACCGCTGCCCTOGTGATGGCCCAACTGCrCAGGATTC^ 

TGCTCGCCGGATGCAATACCTGTGTGACACAGACAGTGG^ 

CTCCCCOUVGACGCTGTGTCCCAOGGACCCACA^ 

ACACCCTGTGACAAAGTATATCATGACXrnntXXAGA^ 

CCCTCACCAATAGCAGAGGOGAAAACIXntSGCTATAGGAGAT^ 

CCIACCGATTGCTTTAGGAAAC^CCCTGAGGCTACCTAT 

CACCAGftCACGCTGACGTCATCXXrTGTGAGA^ 

TCCCTATCAATGCCTATACCACAGGCCCTTGCACACCCCTCC 

GATGCCACAAGCATTCTGGGAATCGGAACCGTCCTG^ 

ATACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGACAGC» 

TGCATCACntXSAGGCCTAGCTCGGGCCCTACCGATCCC^ 

ACATGOXSATTCGCTGACCTCGGCCCT^ 

TGCCAAAAGCGTCTG05GACCCGTCTACTGTGAGGAA 

CCGAACAGTTTAAGCAAAAGGCTCTGGGACTGCTCCAGA 

CCTAGCTCGGACCAAATGTGGAAGTGTCT G ATTAGGCTCAACCCTAC 

TGGCCCTGTGTATTGCTTTACCCXrrACCCCTGTGGTCGTGGGAA 

AGCItXTGAGAAGGCTtXJACCAATGGATTAGCT 

GATGGCTCCTGXn'CCACCGTCAGCTCCGAGGC^ 

CCGTCCAGAATCTGGCTGAGCAATTCAAACAG^ 

TGCCTTCAGTCCGACGCTGC^^ 

GGAAGCGTCTTCCTCGTGGGACAGCTCTTC^ 

TGGCTGTGCCTGGTACGAACTGACACCCGCTGAGAC^ 

ATTAACTCCACOGCTCTGAATTGCAATGAGT^^ 

TAACGCTCTGTCCAACTCCCTGCTCAQGCATCACAATCTGGT^ 

AGAAAGTGACAGCCGCTATGTCCACCAATCCCAAACCC^ 

GTCAAGTTTCCCGGAGGCGGAACCCAAACC3tfU^^ 

CTGCGCTAGGGCTCA<^CTCCCCCTCCCTrCg^ 

ACAATACCAGACCCCCTCTGGGAAACTGGTTCG^ 

GAAAGCACACTGTCCACCGCrCTG^ 

ACAGAAAAAGGTCACCTTTGACAGACTGCAA^ 

GCGCTAGGCTCGT t XrrCCrGGCTACCCCT 

GTCACXTGGAATGACAACCX^T AACCTCAAGTCTCCC^ TTTACCGAACTGGATGGCGT 

CCTGAAACTGACACCCATTGCGGCIY&tXX^ 

TCTATCACTCCGCCTCCAGGCAACCCGAACTGATT ^ 

AGCCAGAGCOGCTTGCAGAGCOGCTGGOCTCTTCGATAG 
AGGTCAAGGCreCXXXTTAGCAAAGTC 

AGAAGGTGTAGGGCTAGCGGAGTGCTCACCACAAGCTGTGGC^ 
TGAGATTACOGGACACGTCAAGAATGGCACAATGAGA^ 
GCCTCCACGAATACCCTSTCQGAAGCCAA CTtXC TT G CG^ 
AAAGCCGCTGCCTCCAAGGTCAAGGCTAACCTTC 

TGTTTGGCCCTATGCTCACOGATCCCTOCCACATTACCGCItyV^ 

TGGCCATGCOGTCGGCATTTTCACAGCOGCTGACTTTGACCA 
CX3GATCAGAGACCCTATTGCTGGC!ACTATCCCXXrTAA 

AATAGGCTCATCGCTTTCGCTAGCAGAGGCAA gagaat t cgcc 
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147/216 



ggcgga t ccaccatgc t c^agTGCCTCTGGATGATGCTCCTGATTAG^ 

TCTGAATGCCGCTAGCCTCGCCtK3AACCCATATCAT^ 

AGTCrrAGCCMCACXTTCCCCrATATCGAACAGGGAATG^ 

CTCTAOGGAGTGGGAAGCTCCATCGCTAGCTGGGOCaTTAAGTGGGAGTA 

GTTCTGTCTGCTCCTGCTCGCCGCTGGCGTC 

GCGCTTAC^TGAGCAAAGCCOVTGGCATTGACCCTAACATTAGGACAGGCCT 

GGACrcCTCAGGATTTC^XXrrCTGGCTAGGAAAATGA 

TAGGAGATTCGCTCAGGCTCTGCCTGTGTGGGCC^ 

ACTATGAGCCTACOGCTGCCCAAACCTTTCTGGCTACCTG 

ACAAGGACAATCGCTAGCCCTTGGQCTCACAATG^^ 

AATGGAAACCAAACTGATTACCTGGGGCGCTAAGGGACCCGTCATC^ 

GAAGAGGATGAGAGAGAGATTAGCGTCCCC^CTGAGATTC^ 
CACCCCTCTGAGAGACTGGGCCCATAACGGACTGAGAGAC^ 
CCAAAGCCGTgGAtTTTATCCCJXntCAAAACCTCGAGAO 
ATTAACGCTGTGGCITACTATAGGGGACTGGATGTGTCCGTGATTCC^ 
TATCTCCGCCGATCTGGAAGTGGTCACCTCCAOCTGGGTGC^ 
TCTCCACCGGAGCCCTCATGACAQGCrATACC^ 
CTgGAtTTTAGCCTOGACCCTAACACAAACAGAAGGCCTCAGGAT^ 
AOTCTATCTtXTTCCCCAGAAGGGGACCCAGAA 
CCACAGGCAATCTGCtrroGCTGTAGCTTTAGCAl-rrr^ 
GCTTyGGAAAGCCGTCGCCCATATCAATAGCGTCTGG 
CATC^^TACCAATGGCTCCTGGCATATCAATAGCACAGCCCTC^ 
GCCTCTTCTATCAGCATAAGTTTAACTCXIAGCGGATGCCCTG 
CTCTTCCTCCTGCTCGCCGATGCCAGAGTff 
CGACTCTGAGATTTACGGAGCCTGTTACTC 
GDXTTIXrrCCTGGACAGTGTATCACGGAGCCGGAAC 
ACAAACGTgGAtCAAGACCTCTACAGATTCGTCGCCCCT^ 
TGAGTGTTACGATGCCGGATGCGCTTGGTATAGGTCCGAGCTC^ 
TGCCTTGCTCC 



CATAGGGCTAGGTCCGTGAGAGCCAGACPGCTCGCCAGAGG 

CTTCAATATCCTCGGCGGATGGGTCGCCGCTCAGCTC^ CTCCAGGCTA 

GCCTCCTGAAAGTGCCTTACTTTGTGAGAGTGC^^ 

GGAACX&TGAGGATTGTGGGACCCAG^ 

AGAGGTOGCCCTCAGCACAACOGGAGAGATTCCCTTTTAOGGAAA^ 

TCCTGGCTCGGCAATATCATTAGGGTCAGCGCT^ 

AGGCATGA(XACAGACAATCTGAAATCCCCltXXX;TC^ 

CCCCTCCCAGAAAGAAAAGGACAGTGGTCCTGACAGAGTC 

TTTGGCTCCAGCTCCACCTCCGGCATTACCGGAGAC^ 

CTGGAGAGGCGATCGCATTATCCATACCAGATG^ 

TGTTTACCTTTA(XXCTAGGAGACACT^ 

GCTGGCCTCGCTOGAGCCGCTATCGGAA^^ 

AGACATTTGGGATTGGATTTGCGAAGTGCTC^ 

GCATTCC C T1T A ACTCCACCATTCTGTATGAGGCTGCCGATG 

GAAGGCAATGCCTCCACglCTAGCTCCGGCT 

ATGGGGACCCATTAGCTATGCCAATGG^ 

GGGTOGCCATTAAGTCCCTGACAGAGAGACTUTATt^ 

TATGCCCAACAGACAAGGGGACTGCTCGGCTGTATCATTA 

ACTGGAACAGGCTCTGGATTGCGAAATCTATGGC^CTTGCT 

AGTTTTTCACAGAGCTCGACGGA<^^ 

ATTAAGGCTAGGGCTGCCTGTAGGGCTtaXJGGACTGCAA 

TATCGATCCCAATATCAGAACCGGAGTGAGAACCATTACCACAG 

CTCAGCACAACCCAATGGCAAACCGGACACAGAATGGCTTGGG^ 
CATGGCTCAgCTCCTGAOAATCCCTCAGGCTCCO^ 
ACGTCGGCCCTOGCGAAGGCGCTGTGCAATGGATGAACAGACA 
CTCTGGAGACAGGAAATtX5GAGGCAATATCACAAGGGTC^ 

TGACGCTCACTTTCTC^CCCACA^AAAGCAAACCGGftGAGA AT r i tXCTi ' A CCTCGTG G CTAGGGGAAGGAGACAGCCTA 

TCCCTAAGGCTAGGAGACCCGAAGGCAGAACCItaSGCC^ 

TTTCCCGATCTCGGACTGAGJ^ 

ATACGCTACCGGAAACCTCOCOGGATGCTCCTTCTCC^ 
GCGCTTACCAACTGGGAAAGGTCCTGGT^GAtATKTroGCTGGCTATG 
AAAATCATGAGCGGAGAGGTCAGCTATAGCTCCATGCCTCCC^ 
AAGCTGGAGCACAGTGTCCAGCGAAGCCGGAAGGCA 

CCTTCCACATGCGGAAGCTCCGACCTCTACCTOff 

TATCTGTGACGAATGCCATAGCACAGACGCTACCTCC^ 

(XACACroCCTCAGGATGCCGTCAIXAGAACCCAA^ 

AGAAAGCATCCa3AAGCCACATACTCCA<X;TGl\^ 

TCTGGCTCXXGGAGTGGGAAlXVATCn^^ 
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AACTGCCrATCAATGCCCTCAGC^ 

TGCTCCGGCTCCTC5GCTCAGG6ATATCTCGGACTGGATCTGTGAGGTC 
GCTCATGCCTC^GCT CC CCGGAATCtX^^ 
GACCCTGGATCACACCCAGATGCCTCGTGGA 
TTCAAAagatctTGAgtcgacgaattcgcc 
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Melanoma Savine design 

Two savines - one containing scrambled melanocyte differentiation Aga 
- one containing scrambled melanoma cancer specific Ags 

Genes in melanocyte differentiation Savine 

gplOO 

MDLVLKRCLLHLAV I GALIAVGATKVPRNQDWIXTVSRQLRTKAW PTL I 

GANAS PS I ALNFPGSQKVLPDGQVI WVNNTI INGSQVWGGQPVYPQETDDACI FPDGGPCPSGS WS QKRS FVYVWKTW 
GQYWQVLGGPVSGLS IGTGRAMLGTHTMEVT\T3fHRRGSRSYVPIiAHSSSAFTI TDQVPPSVS VSQLRALDGGNKHPLR 
NQPLTFALQIJn)PSGYLAEADLS YTWDFGDSSGTLI SRALVV^ PLTS CGSS PVPGTTDG 

HRPTAEAPNTTAGQVPTTEWGTTPGQAPTAEPSGTTSVQVPTTBVI STAPVQMPTAES TGMTPEKVPVSEVMGTTLA 
EMSTPEATGMTPAEVS I VVLSGTTAAQVTTTEWVETTARELPI PEPEGPDASS I MS TES ITGSLGPLLDGTATLRLVK 
RQVPIiDCVLYRYGSFSVTLDIVQGIESAE ILQAVPSGEGDAFELTVSCQGGLPKEACMB I SS PGCQPPAQRLCQPVLP 
S PACQLVLHQILKGGSGTYCLNVSLADTNSLAW 
FSVPQLPHSSSHWLRLPRIFCSCPIGENSPLLSGQQV 

MART 

MPREDAHF I YGYPKKGHGHS YTTAEEAAGIGI LTVILGVLLLIGCWYCRRRNG 
FDHRDSKVSLQEKNCEPWPNAPPAYEKLSAEQSPPPYSP 

TRP-1 

PAFLTWHRYHLLRLBKDMQEMLQEPSFSLPYWNFATG 
YDTLGTLCNSTEIX3PIRRNPAGNVARPMVQRLPEP<^VAQC1*EV 

RSLHNIiAHLFLNGTGGQTHLS S QDPI FVLLHTFTDAVFDEWLRRYNADI STFPLENAP IGHNRQYNMVPFWPPVTNTE 
MFVTAPDNLGYTYE 

Tyros 

MLLAVLYCXLWSFQTSAGHFPRACVSSKNIJ1EKECCP 

S WPS VFYHRTCQCSGNFMGFNCGNCKFGFWGPNCTBRRLLVRRNI FDLS APEKDKFFAYIiTLAKHTIS SD YVI P IGTY 

GQMKNGSTPMFNDINIYDI^VWMHYWSMDAIX^ 

YWDWRDAEKOTICTDEYMGGQHPTNPNLLSPASFFSSWQW^ 

PS SADVEFCLSLTQYESGSMDKAANFS FRNTLEGFAS PLTG I ADASQS SMHNALHI YMNGTMSQVQGSANDPI FLLHH 
AFVDS I FEQWLQRHRPI^EVYPEANAPIGHNRES YMVPFI PLYRNGDFFI SSKDLGYDYS YLQDSDPDSFQDYI KS YI» 
EQASRIWSWLLGAAMVGAVLTALIiAGLVSLLC^ 

TRP2 

msplwwgfllscuxtkjlpgaqgqfprvcmtvds 

nqddrelwprkffhrtckctgnfagyncx5ix:kfgvtopncerkkppvirc^ihslspqe^ 

vittqhwi^llgpngtqpqfancsvydffvwijr^ 

ignbsfai^ywnfatgrnbotvctoqiifg 

grnsmklptlxdirdclslqkfdnppffqnstfsfrnalegfdkadgtl^ 
ifwlhs ftdai fdewmkrfnppadawpqelapighnrmynmvpffppvtneelflts 

WPTTLLVVMGTLVALVGLFVIJAFIjQY SKRYTEEA 
MC1R 

MAVQGSQRRLLGSLNSTPTAI PQLGLAANQTGARCLEVS I SDGL FLSLGLVSLVENALWATI aknrnlhs pmycfi c 
CLALSDLLVSGTNVLETAVI IJoLEAGAJUVARAAVI^LDNVIDVITCS SMLSSLCFLGAI AVDR YI S I FYALRYHS IV 
TLPRAPRAVAAIWASVVFSTLFIAYYDHVAVl^CL^ 

FGLKGAVTLTILLGI FFLCWGPFPLHLTLI VLCPEHPTCGCI FKNFNLFIiALI I CNAI IDPLI YAFHSQELRRTLKEV 
LTCSW 

MUC1F 

MTPGTQSPFPLLIiLLT\^TVVTGSGHASSTPGGEKETSATQRSSVPSSTEKNAV 
TLAPATEPASGS AATWGQDVTS VPVTR PALG STTPPAHDVTS APDNK 
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MUC1R 

NRPALGSTAPPVHNVTS ASGSASGSASTLVHNGTSARATTTPASKSTPFS I PSHHSOTPTTLASHSTKTDASSTHHSS 

VTPLTSSNHSTSPQLSTGVSFFFLSFTIISNIjQFNSSLED^ 

WQLTIAFREGTINV1IDVETQFNQYKTEAASRY 

L I ALAVCQCRRKNYGQLD I FPARDTYHPMS EYPTYHTHGRYVPPS STDRS PYEKVS AGNGGSSLS YTNPAVAAAS ANL 



NB Muc 1 Repeat sequences in the middle of the gene were removed 



Genoa in melanoma specific Savine 

BAGS 

MAARAVTLALSAQLLQARLMKEESPVVSWRLEPEDGTALCFI F 
GAGE- 1 

MSWRGRSTYRPRPRRYVEPPEMIGPMRPBQFSDEV^PATPEEGEPATQRQDPAAAQEGEDEGASAGQGPKPEADSQBQ 
GHPQTGCBCEDGPIX^EMDPPNPEBvTeTPEEEMRSHYVAW 

gpX00In4 

SWSQKRSFVYVWKTWGEGLPSQPI IHTCVYFFLPDHLSFGRPFHLNFCDFL 
MAGB-1 

MSLEQRSLHCKPEEALEAQQEALGLVCVQAATSSSSPLVIX3TLEEVTTAGSTDPPQSPQG 

GSSSREEEGPSTSCI LESI*FRAVITKKW\DLVGFXiIJjKYRAREP VTKAEMLESVI KNYKHCFPE I FGKASBSLQLVFG 
IDVKEADPTGHS YVLVTCLGLS YDGLIoGDNQ IMPKTGFLI I VLVM I AMEGGHAPEEE I WEELS VMEVYDGREHS AYGE 
PRKLLTQDLVQEKYLEYRQVPDSDPARYEFLWGPRALAETSYVlCvTiEWIKVSARVRF 

MAGE -3 

MPLEQRSQHCKPEBGLRARGEAIjGLVGAQAPATEEQEAASSSST^ 
LWSQSYEDSSNQEEEGPSTFPDIJBSEFXJAAI^Rja/AELra 

SliQLVTGIELMEVDPIGHLYIFATCLGLSYIXn^IiGDNQIMPKAGLLI IVIiAI IAREGDCAPEEKIWEELSVLEVFEGR 

EDS ILGDPKKLLTQHFVQENYI^YRQWGSDPACYEFL^ 

EE 

PRAMS 

MERRRLWGSIQSRYISMSWTSPRRLV^IAGQSIJiKDEAIAIAALBIiLPRBLFPP 
TCLPLGVLMKGQHLHLETFKAVIilXS^ 
KKRKVTXn^TEAEQPFIPVEVT,VT)LFIJCBGA^ 
IEDLEVTCTWKLPTIAKFSPYIX^ 

LDQLIJmVMl^LETI^ITOCRLSEGDV^fflLSQSPSVSQLSv^ 

TDDQLIJUiLPSLSHCSQLTTLSFYGNSISISAI^SIXQHLIGLSNIiTHV^ 

EIJ.CSI^RPSMVWLSANPCPHCXX)RTFYDPEPILCPCFMPN 

TRP2IN2 

LMETHLS S KR YTEEAGGFFPWLKVYTYRFVIGIiRVWQWEVI S CKL I KRATTRQP 
NYNSOla 

MQAEGRGTGGSTGDADGPGG PG I PDGPGGNAGGPGEAGATGGRGPRGAGAARASG PGGGAPRGPHGGAASGIiNGCCRC 
GARGPESRLIJ2FYIiAMPFATPMEABLARRSLAQDAPPLPVPGV^ I S SCLQQL 

SLLMWITQCFLPVFLAQPPSGQRR 

NYNSOlb 

MLMAQEALAFLMAQGAMIJ^QERRvTRAAEvPGA 
LA6B1 
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MQAEGQGTGGSTGDADG PGGPG I PDGPGGNAGGPGEAGATGGRGPRGAGAARASGPR6GAPRGPHGGAAS AQDGRCP C 

GARRPDSRIiLQLHITMPFSSPMEAELVRRILSRDAAPLPRPGA^ 

SLLMWI TQCFLPVPLAQAPSGQRR 



Differentiation Savine Scramble process 

Disease name : melanoma 

Input filename : Diffmucg.txt 

Output filename : Diffmucs.txt 

Number genes : a 

Number segments : 187 

Segment length : 30 

Segment overlap : IS 

Segments in original order: 

Gene : gplOO 

Segment# : 1 
Offset : 1 
1st Codon : l 

AAMDLVLKRCLLHLAVIGALLAVGATKVPR 
GCXtX^PATGGATCTGGTCCTGAAAAGGTGTCTGCTCCACCTC 



V I G A L L A VG ATKVPRNQDWLGVSRQLRTKA 
GTGATTGGCGCTCTGCT(XSCCGTCGGCGCTACCA 

Gene : gplOO 

Segment# : 3 
Offset : 31 
1st Codon : 1 

NQDWLGVSRQLRTKAWNRQLYPEWTBAQRL 
AACXAAGACTGGCTGGGAGTGTCCAGGCAACTGA 

Gene : gplOO 

Segments : 4 
Offset : 46 
1st Codon : 1 

WNRQLYPBWTBAQRLDCWRGGQVSLKVSND 
TGGAATAGGCAACTGTATCCCGAATGGACAGAGGCTCAGAG^ 

Gene : gplOO 

Segments : 5 
Offset : 61 
1st Codon : 1 

DCWRGGQVSLKVSNDGPTLIGANASPSIAL 
GACTGTTGGAGAGGCGGACAGGTCAGC£TC3kAGGTCAG 

Gene : gplOO 

Segments : 6 
Offset : 76 
1st Codon : 1 

GPTLIGANASFSIALNPPGSQXVLPDGQVI 
GGCCCTACCCTCATCGGAGCCAATGCCTCCrTC^ 

Gene : gplOO 

Segments : 7 
Offset : 91 
1st Codon : 1 

NFPGSQKVLPDGQV INVNNTI INGSQVtfGG 

ATCTGGGTGAATAACACAATCATTAACGGAAGCCAAGTGTGG 



Gene 
Segments 
Offset 
1st Codon 



gplOO 
8 

106 
1 
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WVNNTIINGSQVWGGQPVYPQBTDDACIFP 
TGGGTCAACAATACCATTATCAATGGCT^^ 

Gene : gplOO 

Segment# : 9 
Offset : 121 
1st Codon : 1 

Q » v Y PQBTDDAC I F PDGGPCPSGSWSQKRS 
CAGCCTGTCTATCCCCAAGAGACAGAQjATGCCTGTA 

Gene : gplOO 

Segment^ : 10 
Offset : 136 
1st Codon : 1 

DGGPCPSGSWSQ KRSFVYVWKTWGQYHQVL 
GACXSGAGGCCCTTGCCCTAGCGGAAGCTGGAGCCAAAAGAGAAG^^ 

Gene : gplOO 

Segments : 11 
Offset : 151 
1st Codon : l 

FVYVWKTWGQYWQVLGGPVSGLSIGTGRAM 
TTanCTACGTCTGGAAAACCTGGGGCCAATACTGG 

Gene : gplOO 

Segments : 12 
Offset : 166 
1st Codon : 1 

G G P V S G L S IGTGRAMLGTHTMEVTVYHRRG 
GGCTGGACCCOTCAGCGGACTGTCC^TCGGA^ 

Gene : gplOO 

Segments : 13 
Offset : 1B1 
1st Codon : 1 

JLG T H T M B V T V YHRRGSRSYVPLAHSSSAFT 
C"l l <XkiAACCCATACCATGGAGGTCACCG* rCrACCATAGGAGAGGCTCCAGGTCCTACGTCCCCCT CGCCCATAGCTCCAGCGCTTTCACA 

Gene : gplOO 

Segments : 14 
Offset : 196 
1st Codon : 1 

SRSYVPLAHSSSAFTITDQVPFSVSVSQLR 
AGCAGAAGCTATGTGCCTCTGGCTOVC^^ 

Gene : gplOO 

Segments : 15 
Offset : 211 
1st Codon : 1 

I T D Q V P F SVSVSQLRALDGGNKHFLRNQPL 
ATCACAGACCAAGTGCCITTCrCCGTG^^ 

Gene : gplOO 

Segments : 16 
Offset : 226 
1st Codon : 1 

AL D GGNKHFLRNQPLTFALQLHDPSGYLAE 
GCXXrnXAa3GAGGCAATAAGCArriXX-lt^ 

Gene : gplOO 

Segments : 17 
Offset : 241 
1st Codon : 1 

TFALQLHDPSGYLABADliSYTNDFGDSSGT 
ACCTTTGCCCTtXaGCTCCACGAT^^ 

Gene : gplOO 

Segments : 18 
Offset : 256 
1st Codon : 1 

A D -Jl— L— ? T _*-- P P G DSSGTLI S R ALVVTHTYLEP 
GCCGATC^T<XTACACATCGGATTTCGG 
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Gene : gplOO 

Segment# : 19 
Offset : 271 
1st Cod on : 1 

LI S R ALVVTHTYLBPG PVTAQVVLQAAI PL 
CTGATTAGCAGAGCCCTCGTGGTCACCCATACCTATCTGGA^ 

Gene : gplOO 

Segment # : 20 
Offset : 286 
1st Codon : 1 

G P __ V T A Q v V j QAAI PLTSCGSSPVPGTTDGH 
GGCCCTGTGACAGC£CAAGTGGTCCTGCAAGCCG 

Gene : gplOO 

Segment S : 21 

Offset : 301 

1st Codon : 1 

T S C G S S P V PGTTDGHR PTABAPNTTAGQVP 

ACcrccTGCGGAA<xrrccccxxrrcccra 

Gene : gplOO 

Segment # : 22 
Offset : 316" 
1st Codon : 1 

RPTABAPMTTAGQVPTTBVVGTTPGQAPTA 
AGGCCTACCt^rTGAGGCTCCCAATACCACAGCC^^ 

Gene : gplOO 

Segment # : 23 
Offset : 331 
1st Codon : 1 

TTBVVGTT PGQAPTAEPSGTTSVQVPTTEV 
ACCAOVGAGGTCGTGGGAACCACACCCGGACAGGCTCC^ 

Gene : gplOO 

Segments : 24 
Offset : 346 
1st Codon : 1 

BPSGTTSVQVPTTBVISTAPVQM PTABSTG 
GAGCCTAGCGGAACCACAAGCGTCCAGGTCCC^ 

Gene : gplOO 

Segment # : 25 
Offset : 361 
1st Codon : 1 

1ST A P V Q MPTABSTGMTPBKVPVS BVMGTT 

Gene : gplOO 

Segments : 26 
Offset : 376 
1st Codon : 1 

M T P B K V P V SBVMGTTLABMSTPBATGMTPA 

ATGACACCCGAAAAGGTCCCCGTCAGCGAAGT^ 

Gene : gplOO 

Segments : 27 
Offset : 391 
1st Codon : 1 

LABHSTPBATGMTPABVSIVVLSGTTAAQV 
CTGGCTGAGATGAGCACACXXGAAGCCACAGGC^ 

Gene : gplOO 

Segments ; 28 
Offset : 406 
1st Codon : 1 

B V S I VVj* S G TTAAQVTTTBWVBTTARBLPI 

GAGGTCAGCATTGTGGTCClX7rCXXJUCAG^ 



Gene : gplOO 
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Segments 
Offset 
1st Codon 



154/216 



29 

421 

1 



TTTBWVETTARELPI PBPBGPDASSIMSTE 
ACCACAACOGAATGGGTCGAGACAAeOX^^ 

Gene : gplOO 

Segment^ : 30 
Offset : 436 
1st Codon : 1 

PEPEGPDASSI 
CCCGAACCCX3AAGGCCCTGACGCTAGCTCCAT 



MSTESITGSLGPLLDGTAT 



Gene 
Segments 
Offset 
1st Codon 



gplOO 
31 
451 
1 



SITGSLGPLLDGTATLRLVRRQVP 



AGCATTACCQ 
Gene 

Segments 
Offset 
1st Codon 
L R l i 


: gplOO 
: 32 
: 466 

: 1 

/ K R 


Gene 
Segments 
Offset 
1st Codon 
R Y G 5 


: gplOO 
: 33 
. 481 
1 

5 P S 


AGGTATGGCTt 
Gene 

Segments 
Offset 
1st Codon : 
B S A I 
GAGTCCGCCG/ 


gplOO 
34 
496 
1 

i i l 

LAATCCTC 


Gene : 
Segments 
Offset : 
1st Codon : 
A F 8 I 


gplOO 
35 
511 
1 

» T V 



QVPLDCVLYRYGSFSVTL 



I* D C V L Y 
^TTGCGTCCTGTAT 



I V Q G I 
ATTGTGCAAGGCATT 



T L 



DIVQGIBSASI 
ttATCGTCCAGGGAATCGAAAGCGCTGAGAT 



L Q A V P 



G B G D 



Q A V P S G B 



GDAPBLTVSCQGGLPKB 
rTTTCGAACTGACAGTGTCXTGCCAAG 



SCQ GGLPKBACMBISSPGCQPPAQ 
GCCrriXJAGCTCACCGTCAGCTGTC 



Gene 
Segments 
Offset 
1st Codon 
ACM 



gplOO 
36 
526 
1 

ISSPGCQPPAQRLCQ 



PVLPSPACQLV 



GCCTGTATGGAAATCTCCAGCCCTGGCTGTCAGCCTCCCGCTCA^ 



Gene 

Segments 
Offset 
1st Codon 
R I* C Q 



gplOO 
37 
541 
1 

PVLPSPACQLVLHQI LKGGSGTYCLN 

^TCCTCAAGGGAGGCTCCGGCACATACTGTCTGAAT 



Gene : gplOO 

Segments : 38 
Offset : 556 
1st Codon : 1 

LHQILKGGSGTYCLNVSLADTNSLAVV 
CTGCATCAGATTCTGAAAGGCGGAAGCGGAACCITVTTGC^ 



T Q 



Gene 

Segments 
Offset 



gplOO 

39 

571 
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1st Codon : 1 

VSLADTNSLAVVSTQLIMPGQEAGLGQVPL 
CTGTCCCTGGCTGACACAAACTCCCTGGCTC 

Gene : gplOO 

Segments : 40 
Offset : 586 
1st Codon : 1 

LI MPGQBAGLGQVPLIVGI LLVLMAVVLAS 
CTGATTATGCCTGGCCAAGAGGCTGGCCTCG^ 

Gene : gplOO 

Segments : 41 
Offset : 601 
1st Codon : 1 

IVGILLVLMAVVLASLIYRRRLMKQDFSVP 
ATCGTCGGCATTCTGCTCGTGCT(^TGG 

Gene : gplOO 

Segments : 42 
Offset : 616 
1st Codon : 1 

LIYRRRLMKQ D P SVPQLPHSSSHWLRLPRI 
CTGATTTACAGAAGGAGACTGATGAAGCAAGACTTTAG^ 

Gene : gplOO 

Segments : 43 
Offset : 631 
1st Codon : 1 

0 L P H S S S H W L R h . P R I F C S C P I 6 B N S P I. L S G 
CAGCTCCCCCATAGCT(X»GCCATTGGCrCAGG 

Gene : gplOO 

Segments : 44 
Offset : 646 
1st Codon : 1 

FCSCPIGENSPLLSGQQVAA 
TTCTXn7VGeTGTCCCAl"rGGCGAA 

Gene : MART 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAMPRBDA H F I YGYPKKGHGHSYTTABBAA 
GCCGCTATGCXTOGGGAAGACGCTCACTTT^TC^ 

Gene : MART 

Segments : 2 
Offset : 16 
1st Codon : 1 

KKGHGHSYTTABBAAGIGILTVI LGVLLLI 
AAGAAAGGCCATGGCCATAGCTATACCACAGCCGAA 

Gene : MART 

Segments : 3 
Offset : 31 
1st Codon : 1 

G I C I L T V I L G VLLLIGCMYCRRRNGYRALM 
GGCATTGGCATTCTGACAGTGATTCTGGGAGT 



Gene 
Segments 
Offset 
1st Codon 



MART 
4 

46 
1 

GCWYCRRRNGYRALMDKSLHVGTQCALTRR 
TTGCAGAAGGAGAAACGGATACAGAGCCCTCATGGAT^^ 



Gene : MART 

Segments : 5 
Offset : 61 
1st Codon : 1 

DRSLHVGTQCALTRRCPQBGFDHRDSKVSL 
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3TGTCCCCAAGAGGGATTOGATCACAGAGACTCCAAGGTCAGCCTC 



Gene : MART 

Segments : 6 
Offset : 76 
1st Cod Oil : 1 

C P Q B G F DHRDSKVSLQEKNCBPVVPNAPPA 
TGCCCTCAGGAAGGCTTTGACCATAGGGATAG 

Gene : MART 

Segment # : 7 
offset : 91 
1st Codon : 1 

Q E K N C E P V VPNAPPAYEKLSAEQSPFPYSP 
CAGGAAAAGAATTGCGAACCCGTCGTGCCTAACGCTCCCCCTG^ 

Gene : MART 

Segment** : 8 
Offset : 106 
1st Codon : 1 

YEKLSABQSPPPYSPAA 
TACGAAAAGCTCAGCGCTGAGCAAAGCCCTCrrc 

Gene : TRP-1 

Segment # ; 1 
Offset : 1 
1st Codon : 1 

AAP AF LTWHRYHLLRLBKDMQBMLQBP S.F S 

GCCGCTCXXXCTTTCCTCACr^ 

Gene : TRP-1 

Segments : 2 
Offset : 16 
1st Codon : 1 

LBKDMQBMLQBPSFSLPYHNFATGKHVCDI 
CroGAAAAGGATATGCAAGAGATGCTGCAAGAGCCTAG 

Gene : TRP-1 

Segments : 3 
Offset : 31 
1st Codon : 1 

L P Y W H F ATGRN VCD ICTDD LMGSRSNFDST 
CPGCCITACraSAACTTTGCaWa^ 

Gene : TRP-1 

Segments : 4 
Offset : 46 
1st Codon : 1 

CTDDLMGSRSNPDSTLISPNSVFSQWRVVC 
TGCACAGACX^TCrGATGGGCrCCAGGTCCAACTT^ 

Gene : TRP-1 

Segment* : 5 
Offset : 61 

1st Codon : 1 

LISPHSVFSQWRVVCDSLBDYDTLGTLCHS 
CTGATTAGCCCTAACTOCGTGTTTAGCCAATGGAG 

Gene : TRP-1 

Segments : 6 
Offset : 76 
1st Codon : 1 

DSLBDYDTLGTLCNSTBDGPIRRNPAGNVA 
GACTCCCTGGAAGACTATGACACACIGGGAACCCTCT 

Gene : TRP-1 

Segments : 7 
Offset : 91 
1st Codon : 1 

TBDGPIRR » P AGHVARPMVQRLPBPQDVAO. 
ACCGAAGACGGACtrATTAGGAGAAACCCTGCOGGAAA^ 
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Gene : TRP-1 157/216 

Segment S : 8 
Offset : 106 
1st Codon : 1 

RPMVQRLPEPQDVAQCLBVGLFDTPPFYSN 
AGGCCrATGGTCCAGAGACTGCCTGAGCCTCA^ 

Gene : TRP-1 

Segment S : 9 
Offset : 121 
1st Codon ; 1 

CLEVGLFDTP PFYSNSTNSFRNTVEGYSDP 
TGCCTCGAGGTCGGCCTCTTCGATACCCCTCCCTTT^ 

Gene : TRP-1 

Segments : 10 
Offset : 136 

1st Codon : 1 

S T N S P RNTVBGYSD PTGKYDPAVRSLHNLA 
AGCACAAACTCCTTCAGAAACACAGTGGAAGGCTAT 



Gene 

Segment # 
Offset 
1st Codon 



TRP-1 
11 
151 
1 

TGKYD PAVRS I* H N L A H LFLNGTGGQTH LSS 
ACCGGAAAGTATGACCCTGCCGTCAGGTCCXTTGCATA^ 



Gene : TRP-1 

Segment# : 12 
Offset : 166 
1st Codon : 1 

HLFLNGTGGQTHLS SQDPIFVLLHT FTDAV 
CA<X!TCrTCCrCAACGGAACCGGAGGCCAAACCCATCTGT^ 

Gene : TRP-l 

Segments : 13 
Offset : 181 
1st Codon : 1 

QDPIFVLLHTFTDAVFDBWLRRYMADISTP 

tTTCACAGACGCTGTGTTTGACGAATGGCTCAGG 



: TRP-1 
Segments : 14 
Offset : 196 
1st Codon : 1 

FDBWLRRYNADISTFPLBNAPIGHNRQYMM 
TTCGATGAGTGGCTGAGAAGGTATAACGCroACATTAGCA^ 

Gene : TRP-1 

Segments : 15 
Offset : 211 
1st Codon : 1 

P L B N A P I G HNRQYNMVPFWP PVTNTBH PVT 
CCCCTCGAGAATGCCCCrrATCGGACIACAATAGGCAA 

Gene : TRP-1 

Segments : 16 
Offset : 226 
1st Codon : 1 

V P F W P P V TMTBMPVTAPDHLGYTYBAA 
GTGCCTTTCTGGCCCCCTGTGACAAACACAG^ 

Gene : Tyros 

Segments : 1 % 
Offset : 1 
1st Codon : 1 

AAMLLAVLYCLLNSFQTSAGHP PRACVSSK 
GCaxrrATGCTCCTG GC rGT GC TCT A CTGTC^^ 



Gene : Tyros 

Segments : 2 
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Offset : 16 
1st Codon : 1 

QTSAGHFPRACVSSKNLMEKECCPPWSGDR 
CAGACAAGCGCTGGCCATTTCOCTAGGGCTTGCGTC^ 

Gene : Tyros 

Segments : 3 
Offset : 31 
1st Codon : 1 

N L M E K B C C PPWSGDRSPCGQLSGRGSCQNI 
AACCTCATGGAAAAGGAATGCTGTCCCCCTTGGTCCGGCGATAGGTC 

Gene : Tyros 

Segment# : 4 
Offset : 46 
1st Codon : 1 

S P C GQLSGRGSCQNILLSNAPLGPQFPFTG 
AGCCCTTGCGGACAGCTCAGCGGAAGGGGAAGCTC 

Gene : Tyros 

Segments : 5 
Offset : 61 
1st Codon : 1 

I/LSNAPLGPQFPFTGVDDRESWPSVPYNRT 
CnXTrCAGCAATGCCCCTCTGGGACCCCAATTCCCTTT^ 

Gene : Tyros 

Segments 6 
Offset : 76 
1st Codon : 1 

VP D R B S W P S V FY NRTCQCSGNFMGFNCGNC 
GTGGATGACAGAGAGTCCTGGCCTAGCGTCTTCTATAACAGAA 

Gene : Tyros 

Segments : 7 
Offset : 91 
1st Codon : 1 

CQCSGNPM G F NCGNCKFG PWGPNCTBRRLL 
TGCCAATGCTCCGGCAATTTCATGGGCTTT^ 

Gene : Tyros 

Segments : 8 
Offset : 106 
1st Codon : 1 

KFGFWGPHCTBRRLLVRRWI FDLSAPBKDK 
AAGTTTGGCTTTTGGGGACCCAATTGCACAGAGAG^ 

Gene : Tyros 

Segments : 9 
Offset : 121 
1st Codon : 1 

V R R N 1 F P S ^ P B KDKFFAYLTLAKHTISSD 

GTGAGAAGGAATATCTTTGACCTCAGCGCTCCCGAAAAG 

Gene : Tyros 

Segments : 10 
Offset : 136 
1st Codon : 1 

FFAYLTLAKHTISSDYVI PIGTYGQHKNGS 
TTCTTTGCCTATCTGACACTGGCTAAGCATACCAT^^ 

Gene : Tyros 

Segments .- 11 
Offset : 151 
1st Codon : 1 

TV I P I G T Y G Q MKBGSTPMFNDINIYDLFVW 

TACGTCATCCCTATtXGAACTCTATGGCCAAATGA 



Gene 
Segments 
Offset 
1st Codon 



Tyros 
12 
166 
1 
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T P M F NDINIYDLPVWMHYYVSMDALLGGSB 
ACCCCTATGTTTAACGATATCAATATCTATGACCTCTTCGTCTGGATGCA 

Gene : Tyros 

Segment # : 13 
Offset : 181 
1st Codon : l 

M H Y Y V S MDALLGGSEIWRDIDPAHEAPAFL 
ATGCATTACTATGTGTCCATGGATGCCCTrcTGGGAGGCT 

Gene : Tyros 

Segments : 14 

Offset : 196 

1st Codon : 1 

^CTGGAGGGA IDPAHBAPAFLPWHRLPl ' LRWB QBIQ 

Gene : Tyros 

Segroent# : 15 
Offset : 211 
1st Codon : 1 

P W H R Is F L L R W E Q E IQKLTGDENPTI PYNDN 
CCCTGGCACAGAC1XTITTCTGCTCAG 

Gene : Tyros 

Segments : 16 
Offset : 226 
1st Codon : 1 

K L T G D E W P T IPYWDWRDAEKCDI CTDBYMG 
AAGCTCACCGGAGACGAAAACTTTACCATTCCCTATTGGGATTGGAG 

Gene : Tyros 

Segments : 17 
Offset : 241 

1st Codon : 1 

R ° A E K ,_ C __ D ICTDBYMGGQH PTNPNI.LSPASP 
AGGGATGCCGAAAAGTGTGACATTTGCACAGACG^ 

Gene : Tyros 

Segments : 18 
Offset : 256 
1st Codon : 1 

GQHPTNPNLLSPASFPSSWQIVCSRLEBYN 
GGCCAACACCCTACCAATCCCAATCrGCTCAGCXXr^ 

Gene : Tyros , 

Segments : 19 
Offset : 271 
1st Codon : 1 

PSSWQ IVCSRLBBYNSHQSLCNGTPEG PLR 
TTCTCCAGCTGGOU»TTGTGTGTA^ 

Gene : Tyros 

Segments : 20 
Offset : 286 
1st Codon : 1 

S H °* S L ° H GTPBGPLRRN P6HHDKSRTPRLP 
^GCCATCAGTCCCroTGTAAOGGAACCCCTGAGGGAOCCCTCAG 

Gene : Tyros 

Segments : 21 
Offset : 301 
1st Codon : l 

RNPGHHDXSRTPRLPSSADVBPCLSLTQYB 
AGGAATCCCGGAAACCATGACAAAAGCAGAACCCCTAGGCT^ 

Gene : Tyros 

Segments : 22 
Offset : 316 
1st Codon : 1 

SSADVEPCLSLTQYBSGSHDKAANFSPRHT 
AGCTCCGCCGATGTGGAATTCTGT^^ 
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Gene 

Segment* 

Offset 



Tyros 

23 

331 



1st Cod on : 1 

SGSMDKAA W F S P RNTLEGPASPLTGIADAS 
AGOGGAAGCATGGAOVAAGCrcCTAACTTTAGCTTTAGGAATA 



Segments : 24 
Offset : 346 
1st Codon : 1 

I* B G F ASPLTGIADASQSSMHNALH IYMNGT 
CTGGAAGGCTTTGCCTCCCCCCTCACCGGAATCGCTGAC^ 

Gene : Tyros 

Segment# : 25 
Offset s 361 
1st Codon : 1 

QSSMHNALHIYHNGTMSQVQGSANDPI FLL 
Otf^CCAGCATGC!ACAATGCCCTCCACATTTACATGAA0GGAACCA 

Gene : Tyros 

Segments : 26 
Offset : 376 
1st Codon : 1 

H S QVQGSAKDPI FLLHHAFVDS IFEQWLQR 
ATGTCCCAGGTCCAGGCa^GCXKrrAACGATCXXam 

Gene : Tyros 

Segments : 27 
Offset : 391 
1st Codon : 1 

« H A F V D S I F BQWLQRHRPLQBVYPEANAPI 
CACCATGCCTTTGTGGATAGCATTTTCGAACAG^ 

Gene : Tyros 

Segments : 28 
Offset : 406 
1st Codon : 1 

H R P L Q B V Y F B ANAPIGHNRESYMVPFI PLY 
CAC3^GACCXXrrCCAGGAACrrUTATCCCGAAG 

Gene : Tyros 

Segments : 29 
Offset : 421 
1st Codon : 1 

GHNRBSYMVPFIPLYRNGDFFISSKDLGYD 
GGCXftTAACAGAGAGTCCTACAlt ^ T^^ 

Gene : Tyros 

Segments : 30 
Offset : 436 
1st Codon : 1 

RNGDFFZ SSKDLGYDYSYLQDSDPDSPQDY 
AGGAATGGCXSAlTlVlTrATCTCCACCAAAGACCTCGGCITlTGA 

Gene : Tyros 

Segments : 31 
Offset : 451 
1st Codon : 1 

Y S Y L Q D S DPP S F QDY I KSYLBQASR I W S W L 
TACrCCTACCTCX^AGGATAGCX^TCCCGATAGCTT^ 

Gene : Tyros 

Segments : 32 
Offset : 466 
1st Codon : 1 

IKSYLBQASRIWSWLLGAAMVGAVLTALLA 
ATCAAAAGCTATCTCGAACAGGCTAGCAGAATCTGGAGCTGGCPGCT 

Gene : Tyros 



Gene 



: Tyros 
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Segments : 33 
Offset : 461 
1st Codon : 1 

LGAAMVGAVLTALLAGLVSLLCRHKRKQL P 
CTGGGAGCCGCTATGGTCGGCGCTGTGCTCACCGClt?TGCTC 

Gene : Tyros 

Segments : 34 
Offset : 496 
1st Codon : 1 

GLVSLLCRHKRKQLPEEKQPLLMEKBDYHS 
GGCCTCGTGTCCCTGCTCTGCAGACACAAAAGGAAACAGCTC^ 

Gene : Tyros 

Segment* : 35 
Offset : Sll 
1st Codon : 1 

EBKQPLLMEKBDYHSLYQSHLAA 
GAGGAAAAGCAACCCCTCCTGATGGAGAAAGAGGATTACCATAGCCTCTACCA 

Gene : TRP2 

Segment # t 1 
Offset : 1 
1st Codon : 1 

AAMSPLWWGFLLSCLGCKILPGAQGQPPRV 
GCaxrrATCTCCCCCCT C TGGTGGG C^ Yrj^JX^ 

Gene : TRP2 

Segments : 2 
Offset : 16 
1st Codon : 1 

G C KI LPGAQGQFPRVCMTVDSLVNKBCCPR 

GGCTGTAAGATTCTGCCTGGOGCTCAGGGAC^ 

Gene : TRP2 

Segments : 3 
Offset : 31 
1st Codon : 1 

C M T V D S L VNKECCPRLGAESANVCGSQQGR 
TGCATGA(X!GTCXSACTCCCrGGTCAACAAAGAGTGT^ 

Gene : TRP2 

Segments : 4 
Offset : 46 
1st Codon : 1 

LGABSANVCGSQQGRGQCTBVRADTRPWSG 
CTGGGAGCCGAAAGCGCTAACGTCTGOGGAAGCCAACAGGGAAGGGG^ 

Gene : TRP2 

Segments : 5 
Offset : 61 
1st Codon : 1 

GQCTEVRADTRPWSGPYI LRNQDDRELWPR 
GGCCAATGCACAGAGGTCAGGGCTGACACAAGGCCTTGGTCCG 

Gene : TRP2 

Segments : 6 
Offset : 76 
1st Codon : 1 

P Y I L RNQDDRELWPRKPPHRTCKCTGNFAG 
CCCTATATCCTCAGGAATCAGGATGACAGAGAGCTCTGGCCT 

Gene : TRP2 

Segments : 7 
Offset : 91 
1st Codon : 1 

K F P H RTCKCTGNPAGYNCGDCKFGWTGPMC 
AAGTTTTTCCATAGGACATGCAAATGCACftGGCAATTTO 

Gene : TRP2 

Segments : 8 
Offset : 106 
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1st Codon : 1 

Y N C 6 D C K P GWTGPNCBRKKPPVIRQNIHSL 
TACAATTGCGGAGACTGTAAGTTTGGCTGGACCGGACCCAATT^ 



Gene 
Segment # 
Offset 
1st Codon 

B R K K 



TRP2 
9 

121 
1 

P P 



R Q N I H S 



GAGAGAAAGAAACCCCCTGTGATTAGGCAAAACAT 



LSPQBRBQPLGALDLA 
ITTCCTCGGCGCTCTGGATCTCGCT 



Gene 
Segment* 
Offset 
1st Codon 
S P Q 



TRP2 
10 
136 
1 

R B 



QFLGALDLAKK 



VHPDYVITTQHW 



AGCCCTCAGGAAAGGGAACAGTTTCTGGGAGCCCTCGACCTC^ 



Gene 
Segment* 
Offset 
1st Codon 
K K R 



Gene 
Segment* 
OffBet 
1st Codon 



TRP2 
11 
151 
1 

H P 



TRP2 
12 
166 
1 



D Y V I T T Q 



HWLGL LG PNGTQ PQFAN 
rCCTGGGACCCAATGGCACACAGCCTCAGTTTGCCAAT 



LGLLGPNGTQPQPANCSVYDPPVWLHYYSV 
CTGGGACTGCTCGGCCCTAACGGAACCCAACCCX^ 



Gene 
Segment* 
Offset 
1st Codon 



TRP2 
13 
181 
1 



C S V Y D 



PPVWLHYYSVRDTLLGPGRPYRAID 
VTTACTCCGTGAGAGACACACTGCTCGGCCCTG 



Gene 
Segment* 
Offset 
iBt Codon 



TRP2 
14 
196 
1 



R ° -__ L - L _ G P G R p Y » A X P P SHQGPAPVTWHRYH 
AGGGATACCCTCCTGGGACCCGGAAGGCCTTACAGAGCX^ 



Gene 
Segment* 
Offset 
1st Codon 



TRP2 
15 
211 
1 



P S H Q G 



PAPVTWHRYHLLCLBRDLQRLIGHB 
VTGG^ZATAGGTATCACCTCCTGTGTCTGGAAAGGGATCTGCA 



Gene 
Segment* 
Offset 
1st Codon 
L L C L 



TRP2 
16 
226 
1 

B R D L Q R 



IGNBSPALPYMNPATGRNB 



CTGCTCTGCCTCGAGAGAGACCTCCAGAGACTG^ 

Gene : TRP2 

Segment* : 17 
Offset : 241 
1st Codon : 1 

SPALPYWHFATGRNECDVCTDQLFGAARPD 
AGCrritXtX'riXXVrATTGGAAlT^ 

Gene : TRP2 

Segment* : 18 
Offset : 256 
1st Codon : 1 
CDVCTDQLPGAAR 



PDDPTLISRNSRPSSHB 
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TGCGATGTGTGTACCGATCAGCTCTTCGGAGCCGCTAG^ 

Gene : TRP2 

Segment # : 19 

Offset : 271 

1st Codon : 1 

DPTLISRNSRFSSWETVCOSIiDOYNHLVTL 
GACCCTACCCTCATCTtX^tfSGAATAGCAGATTCTC 

Gene : TRP2 

Segment # : 20 
Offset : 286 
1st Codon : 1 

TVCDSLDDYNHLVTLCNGTYBGLLRRNQMG 
ACCGTCFGCGATAGCCTCGACGATTACAATC 

Gene : TRP2 

Segment # : 21 
Offset : 301 
1st Codon : 1 

CNGTYEGLLRRNQMGRNSMKLPTLKDIRDC 
TCCAATGGCACATAOSAAGGCCTCCTGAGAAGGAATCAGATGGG 

Gene : TRP2 

Segment* : 22 
Offset : 316 
1st Codon : 1 

RHSMKLPTLKDIRDCLSLQKPDNPPFFQNS 
AGGAATAGCATGAAGCTCCCCACACTCAAAGACArTAG 

Gene : TRP2 

Segment# : 23 
Offset : 331 
1st Codon : 1 

I* S L Q K F D N P P F F QNSTFS FRNALEGFDKAD 
CTGTCCCTGCAAAAGTTTGACAATCC^ 

.Gene : TRP2 

Segments : 24 
Offset : 346 
1st Codon : 1 

T F S P R MALBGFDKADGTLDSQVMSLHNLVH 
ACXTTTTAGCTTTAGGAATGCCCTCGAGGGATTCGAT 

Gene : TRP2 

Segments : 25 
Offset : 361 
1st Codon : 1 

G T I»D SQVMSLHNLVHSFLNGTNALPHSAAN 
GGCACACTGGATAGCCAAGTGATGAGCCTCCACAATCTGGTCCA 

Gene : TRP2 

Segments : 26 
Offset : 376 
1st Codon : 1 

SFLNGTNALPHSAANDPI FVVLHSFTDAI F 
AGCTTTCTGAATGGCACAAACXXrrCTGCCTC^ rATCTTT 

Gene : TRP2 

Segments : 27 
Offset : 391 
1st Codon : 1 

D P I F VVLHSFTDAIFDBWMKRFHPPADAWP 
GAC(XTATCTTTGTGGTCXrrGCATAGCTTTAC^ 

Gene : TRP2 

Segments : 28 
Offset : 406 
1st Codon : 1 

DEWMKRFNPPADAHPQELAPIGHNRMYNHV 
C^CGAATGCATGAAGAGATTCAATCCCXCTGC^ 
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Gene 

Segment ft 
Offset 
1st Codon 



TRP2 
29 
421 
1 



Q B L A P IGHNRMYNMV 
CAGGAACTGGCTCCCATTGGCCATAACAGAATGTATAACAT 



PF'PPPVTNBBLFLTS 



Gene 
Segment # 
Offset 
1st Codon 
P F F 



TRP2 
30 
436 
1 

P P V T 



N E E L F L T 



DQLGYSYAIDLPVSV 



CCL^lTri'I'CCCTCCCGTCACCAATGAGGAA C r^^ ^ 



Gene 
Segment* 
Offset 
1st Codon 



TRP2 
31 
451 
1 



- D Q L GY SYAIDL PVSVEBT PGWPTTLLVVMG 
GACCAACTGGGATACTCCTACGCTATCGATCTGCCTC 



Gene 
Segment # 
Offset 
1st Codon 
BET 



TRP2 
32 
466 
1 

PGWPTTLLVVMG 

'CCTCCTGGTCGTGAT 



TLVALVGLFVLL 



Gene 

Segment * 
Offset 
1st Codon 



TRP2 

33 

481 

1 



TLVALVGLFVLLAFLQYRRLRKGYTPLMBT 
ACCCTCGTGGCTC ^^ 

Gene : TRP2 

Segments : 34 
Offset : 496 
1st Codon : 1 

QYR RLRKGYTPLM BTHLS SKRYTEEAAA 
CAGTATAGGAGACTGAGAAAGGGATACACACCCCTCATGGAAA 

Gene : MC1R 

Segment* : 1 
Offset : 1 
1st Codon : 1 

AAMAVQGSQRRLLGSLNSTPTAIPQLGLAA 
GCCGCTATGX?CrGTGCAACGCTCCCAGAGAAGGCTCCTGGGA^ 

Gene : MC1R 

Segments : 2 
Offset : 16 
1st Codon : 1 

LNS TP TAX PQLGLAANQTGARCLEVS I S DG 
CTXSAATAGCACACCCACAGCCATTCCCCAACTGGGACre 

Gene : KC1R 

Segment* : 3 
Offset : 31 
1st Codon : 1 

NQTGARCLBVSISDGLFLSLGLVSLVBNAL 
AACCAAACCGGAGCCAGATGCCTOSAGGTCAGCATTAG 

Gene : HC1R 

Segment* : 4 
Offset : 46 
1st Codon : 1 
LFLSLGLVSLV 



BNALVVATIAKNRNLHSPM 

TCTGGTCGTGGCTACO\TTGCCAAAAACAGAAACC 



Segment* 



: MCLR 

: 5 



Figure 27 (Cont) 



WO 01/090197 



PCT/AU01/00622 



165/216 

Offset : 61 
1st Codon : 1 

VVATIAKNRWLHSPMYCFICCLALSDLLVS 
GTGGTCGCCACAATCGCTAAGAATAGGAATCTGCATAGC^^ 

Gene : MC1R 

Segment # : 6 
Offset : 76 
1st Codon : 1 

YCPICCLALSDLLVSGTNVI/ETAVI LLLBA 
TACTGTTTCATTTGCrGTCTGGCTCTGTCCGACCT 

Gene : MC1R 

Segment* : 7 
Offset : 91 
1st Codon : 1 

GTNVLETAVI LLLEAGALVARAAVLQQLDN 
GGCACAAACGTCCTGG AAACCXXntnt^TTCTG CAAT 

Gene : MC1R 

Segment* : 8 
Offset : 106 
1st Codon : 1 

GALVARAAVLQQLDNVIDVI TCSSMLS SLC 

Gene : MC1R 

Segment* : 9 
Offset : 121 
1st Codon : 1 

VIDVITCSSMLSSLCFLGAIAVDRYISIFY 
GTGATTGACXnt»TO«»TGCTCCAGCAT^^ 

Gene : MC1R 

Segment* : 10 
Offset : 136 
1st Codon : 1 

FLGA1AVDRYISI FYALRYHS IVTLPRAP R 
TTCCTOGGCGCTATCGCTGTGGATAGGTATATCTCCAT^ 

Gene : MC1R 

Segment* : 11 
Offset : 151 
1st Codon : 1 

ALRYHSIVTLPRAPRAVAAIWVASVVFSTL 
GCCCTCAGGTATCACrCCATCGTCACCCTCTC 

Gene : HC1R 

Segment* : 12 

Offset : 166 

1st Codon : 1 

AVAAIWVASVVFSTLFIAYYDHVAVLLCLV 
GCCtTltllX'CtXTATCTGGGTGGCT 

Gene : MdR 

Segment* : 13 
Offset : 181 
1st Codon : 1 

FIAYYDHVAVLLCLVVFFLAMLVLMAVLYV 
TTC A TT GC CT A 7T A CGATCACGTCSCCGTCCT G CT 

Gene : MC1R 

Segment* : 14 
Offset : 196 
1st Codon : 1 

VPPLAMLVLMAVLYVHMLARACQHAQGIAR 
GTGTTrrTCCTC GCCA T GC TGGTCCT G ATGGCCGTC^ 

Gene : MC1R 

Segment* : 15 

Offset : 211 

1st Codon : 1 
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HMLARACQHAQGIARLHKRQRPVHQGFGIiK 
CACATGCTGGCTAGGGCTTCCCAACACGCTCAGGGAA 

Gene : MC1R 

Segments : 16 
Offset : 226 
1st Codon : 1 

LHKRQRPVHQGPGLKGAVTLTI LLGIPPLC 
CIGCATAAGAGACftGAGACCCGTCCftCCAAGG 

Gene : MC1R 

Segments : 17 
Offset : 241 
1st Codon : 1 

GAVTLTI L L G I PPLCWGPFFLHLTLIVLCP 
GGCGCTGTGACACTGACAATCCTCCK^^T CIVm 



Gene 
Segments 
Offset 
1st Codon 



MC1R 
IB 
256 
1 



NGPFFLRLTLI 



LCPEHPTC 



GCIFKNFNLP 
\TCTTTAAGAATTTCAATCTGTTT 



Gene : MC1R 

Segment^ : 19 
Offset : 271 
1st Codon : 1 

EHPTCGC IFKNFNLFLAL I ICNAI IDPLIY 
GAGCATCCCACATGCGGATGCATTTTCAAAAACTTTAA 



Gene : MC1R 

Segments : 20 
Offset : 286 
1st Codon : 1 

LALIICHAIIDPLIYAFHSQELRRTLKEVL 
CTGGCTCTGATTATCTtnAACGCTATCATTGA 



Gene : MC1R 

Segments : 21 
Offset 301 
1st Codon : 1 

AFHSQBLRRTLKBVLTCSWAA 
GCCTTTCACTCCCTVGGAACrrGAGAAGGACACTG 



Gene 

Segments 
Offset 
1st Codon 
A A M T P 
GCCGCTA' 



KUC1F 
1 
1 
1 

G T Q 



FFLLLLLTVLTVV 



T G S G H A S 
VTGCCTCC 



Gene : MUC1F 

Segments : 2 
Offset : 16 
let Codon : 1 

LLTVL T VVTGSGHASSTPGGEKETSATQRS 
CTCCTC3iCCGTCCTGACAGTGGTCACCGGAAGCGGACAC^ 

Gene : MUC1P 

Segments : 3 
Offset : 31 
1st Codon : 1 

STPGGBKBTSATQRSSVPSSTBKWAVSMTS 
AGOVCACCOGGAGGCGAAAAGGAAACCrCCGCCACACAGAGAAGC^ 



Gene : MDC1F 

Segments : 4 
Offset : 46 
1st Codon : 1 

SVPSSTBKNAVSMTSSVLSSHSPGSGSSTT 

ATGACAAGCrCXGTGCTCAGCTCCCACTCCCCCGG 
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Gene : MUC1F 

Segment* : 5 
Offset : 61 
1st Codon : 1 

SVLSSHSPGSGSSTTQGQDVTLA PATEPAS 
AGCGTCCTGTCOVGCCATAGCCCTX3G<rc 

Gene : MUC1F 

Segment* : 6 
Offset : 76 
1st Codon : 1 

QGQDVTIjAPATEPASGSAATWGQDVTSVPV 
CAGGGACAGGATGTGACACTGGCTCCCXSCrA 



Gene 
Segment* 
Offset 
1st Codon 



MUC1F 
7 

91 
1 



GSAATWGQDVT 



VPVTRPALGST 



T P P A H D V 

VTGACGTC 



Gene : MUC1F 

Segment* : 8 
Offset : 106 
1st Codon : 1 

TRPALGS TTPPAHDVTSAPDNKAA 

ACCAGACCCGCTCTGGGAAGCACAACCCCTCCCG 

Gene : MUC1R 

Segment* : 1 
Offset : 1 
1st Codon : 1 

AANRPALGSTA PPVHNVTSASGSASGSAST 
GCCGCTAACAGACCCGCTCTGGGAAGCACAGCCCCTC^ 

Gene : MUC1R 

Segment* : 2 
Offset : 16 
1st Codon : 1 

NVTSASGSASG SASTLVHNGTSARATTTPA 
AAOGTCACCTCOGCCTCCGGCTCOGCCTC 

Gene : MUC1R 

Segment* : 3 
Offset : 31 
1st Codon : 1 

LVHNGTSARATTTPASKSTPPS I PSHHSDT 
CTGGTCCACAATGGCACAAGCGCTAGGGCTACCAC^ 

Gene : MUC1R 

Segment* : 4 
Offset : 46 
1st Codon : 1 

SKSTPFSIPSHHSDTPTTLASHSTXTDASS 
AGCAAAAGCACACCCTT TA GC A TTCCCTCCCACCATO 

Gene : MUC1R 

Segment* : 5 
Offset : 61 
1st Codon : 1 

PTTLASHSTKTDASSTHHSSVPPLTSSNHS 
CCCACAACCCTCGCCTCCC3^tXaCCAA 




HSTSPQLSTGVSFFFtS 
^TAGCACAAGCCCTtaGCTCAGCACAGGCGTC ^ 



MUC1R 
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Segment** : 7 
Offset : 91 
1st Codon : 1 

TSPQLSTGVS PFFLSPHI SNLQFNSSLEDP 
ACCTCO:CCCAACTGTCCACrGGA G T ^ 



Gene 
Segment # 
Offset 
1st Codon 
PHI 



MUC1R 
8 

106 
1 

S N L Q P N 



SSLEDPSTDYYQEIiQRDISB 



TTCCATATCTCCAACCTCCAGTTTAACTCCAGCC^ 



: MUC1R 
Segment* : 9 
Offset : 121 
1st Codon : 1 

S T D Y Y 0 E I*Q RDISEMFLQI YKQGGFLGLSN 
AGCACAGACTATTACrO^AGAGCTCCAGAGAGACATTAG 



Gene 
Segments 
Offset 
1st Codon 
P L Q 



MUC1R 
10 
136 
1 

I Y K Q G G 



P L G L S N 



TTCCTCCAGATTTACAAAC 



I K F R P G S 
rAAGTTTAGGCC 



Gene : MUC1R 

Segments : 11 
Offset : 151 
1st Codon : 1 

I K P R P G 
ATCAAAT 



VVVQLTLAPRB 



GTINVHDVETQP 
ITCAATGTGCATGACGTCGAGACACAGTTT 



Gene : MUC1R 

Segment # : 12 
Offset : 166 
1st Codon : l 

FREGTINVHDVETQPNQYKTBAASRYNLTI 
TTCAGAGAGGGAACCATTAACGTCCACX^TGTG 



Gene 
Segments 
Offset 
1st Codon 
H 0 Y K 



: MUC1R 
13 
181 
1 

T E 



A A S 



YNLTI SDV 



AACCAATACAAAACCGAAGCCGCTAGCAGATACAATCTGAO^T 



VSDVPPPPSAQ 



Gene : MUC1R 

Segments : 14 
Offset : 196 
1st Codon : 1 

SDVSVSDVPPPPSAQSGAGVPGWGI 
AGCGATGTGTCCGTGTCCX^VCGTCCXltrn^ 



A L L V L 



Segments 
Offset 
1st Codon 



MUC1R 
15 
211 
1 



S G A G V 



PGWGIALLVLVCVLVALAIVYLIAL 
VrTGCCCrCCTGGTCCTGGTCTGCGTCC^ 



Gene : MUC1R 

Segments : 16 
Offset : 226 
1st Codon : 1 

VCVLVALAIVYLIALAVCQCRRKNYGQLDI 

ciTyitntyrGcrcG-fG^^ 



Gene 

Segments 

Offset 



MUC1R 

17 

241 
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1st Cod got : 1 

AVCQCRRKNYGQLD IP PARDTYHPMSBYPT 
GCCGTCTGCC^TGCAGAAGGAAAAACTATGGCC^ 

Gene : MUC1R 

Segment* : 18 
Offset : 256 
1st Codon : 1 

FPARDTYHPMSBYPTYHTHGRYVPPSSTDR 
TTOCCTGCOU3AGACACATACCATCCCATGAGCGAATA 

Gene : MUC1R 

Segment # : 19 
Offset : 271 
1st Codon : 1 

YHTHGRYVPPSSTDRSPYEKVSAGNGGSSL 
TACCATACCCATGGCAGATACGTCCCCCCTAGCT 

Gene : MUC1R 

Segment # : 20 
Offset : 286 
1st Codon : 1 

SPYBKVSAGNGGSSLSYTNPAVAAAS ANLA 
AGCCCTTACXSAAAAGGTCAGCGCTGGCAAT^ 



Gene : MUC1R 

Segment* : 21 
Offset : 301 
1st Codon : 1 

SYTNPAVAAASANLAA 
AGCTATACCAATCCCXX7rGTGGCTGCCGCn7VGCG 



Segments in scrambled order: 



gpioo #4 

W N R Q L Y PBWTBAQRLDCWRGGQVSLKVSND 
TGGAATAGGCAACTGTATCCtXiAATGGACAGAGGCTCAG^ 

TRP2 #6 

P Y I L R NQDDRBLW PRKP PHRTCKCTGNPAG 
CCCTATATOCTCAGGAATCAGGATGACAGAGAGCTC^ 

Tyros #30 

RNGDPPISSKDLGYDYSYLQDSDPDSFQDY 
AGGAATGGCGAUTI'Ln'l^'ATCTCCAGCAAAGACCTCGGCT 

TRP-1 #1 

AAPAPLTWHRYHLLRLBKDMQEMLQBPSFS 
GCCGCTCCCGCTTTCCTCACCTGGCAC3^GA 

Tyros #29 

G H M R B S Y M V P P I PLYRNGDFFISSKDLGYD 
GGCCATAACAGAGAGTCCTACATGGTGCCITTCAT^ 

TRP2 #16 
L L C L 

gplOO #23 

TTBVVGTTPGQAPTAEPSGTTSVQV PTTBV 
ACCACAGAGGTCGTGGGAACCACACCCGGACAGGCTCCC^ 

MUC1R #9 

S T D Y Y Q E h Q RDISEMPLQIYKQGGPLG LSN 
AGCACAGACTATTACCAAGAGCTCCAGAGAG^ 

gplOO #36 

ACMBI SSPGCQPPAQRLCQPVLPSPACQLV 
GCCTGTATGGAAATCrCCAGOCCTCGCTGTCAG^ 

TRP2 #31 

DQLGYSYAIDLPV SVBBTPGWPTTLLVVMG 
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GACCAACTGGGATACTCCTACGCTATCGAT 



ATGGGA 



TRP-1 #7 

TEDGPIRRN PAGNVARPMVQRLPE PQDVAQ 
ACCGAAGACGGACCCATTAGGAGAAACCCTGCCGGAAAC^ 



TRP2 #3 

CMTVDSLVN KBCCPRLGABSANVCGSQQGR 
TGCATGACCGTCGACTCCCTGGTCAACAAAGAGTGTT^ 

MUC1R #13 

NQYKTBAAS RYNLTISDVSVSDVPFPFSAQ 
AACCAATACAAAACOGAAGCCGCTAGCAGATACAATCTGACAATCTCOGACXS^ 



TRP2 #1 
AAMSPLWWG 

GCCGCTAT 



FLLS CLGCKI LPGAQGQP PRV 
rCTGCTCAGCTGTCTGGGATGCAAAATCCTCCCCGGA^ 



gplOO #18 

A ° L S Y T W D P G DSSGTLISRALVVTHTYLBP 
GCCGATCTGTCCTACACATGGGATTTCXSGAGACTCC^ 



gplOO #27 
L A B M 



STPBATGMTPAKVSI 
\TGACCCCTGCOGAAGTGTCCAT 



VVLSGTTAAQV 



MUC1R #11 

IKFRPGSVVVQLTLAFREGTINVHDVBTQF 
ATCAAATTCAGArcCGGAAGCCTCGTGGTCCAGCTCAC 



MUC1F #7 

GSAATWGQDVTSV 
GGCTCCGCCGCTACCT 



P V T R 



PALGSTTPPAHDV 
nXXTCTCGGCTCCACCACACXXXXTGCCCATGACGTC 



MC1R #16 

LHKRQRPVHQG 
CTGCAT 



GLKGAVTLTILLGIFFLC 
7TC^CCCTCACCATTCT GCTC GGCA l 'Ti'i C Tl m l\T l \»lX^ 



MC1R #20 

LALIICNAI IDPLIYAFHSQEIiRRTLKBVL 
CTGGCTCTGATTATCTGTAACGCTATCATTGACCCTC^ 

TRP2 #7 

K FF HRTCKCTGW FAGYNCGDCKFGWTG PNC 
AAGTTTTTCCATAGGACATGCAAATGCACAGGCAATTTCGCT 

TRP2 #23 

L S L Q K F D NPPFFQNSTFSPRHALEGFDKAD 
CrGTCCCTGCAAAAGTTPGACAATCCCCCTTTCTTrCACAAT^ 



MUC1R #4 
S K S T P F 



IPSHHSDTPTTLASHSTKTDASS 
^TTCCCTCCCACCATAGCGATACCCCTACCACACTGG 



MUC1R #1 
AANRPALGS 



TAPPVHNVTSASGSASGSAST 
^"nnxyiCAAGCGCTAGCGGAAGCGCTAGCGGAAGCGCTAGCACA 



TRP2 #21 

CNGTYBGLLRRNQMGRHSMKLPTLKDIRDC 
TGCAATGGCACATACGAAGGCCTCCTGAGAAGGAATCAGATGGG 



MUC1R #6 

T H H S 
ACCCAT 



SVPPLTSSHHSTSPQLSTGV 



F F F L S 



MC1R #13 

PIAYYDHVAVLLCLVVFFLAMLVLMAVLYV 
TTCATTCCCTATTACGATCACGTOGCCGTCCT GC TC^ 



Tyros #16 
K L T G D 



B N F T I P 



W D W 



CDICTDEYMG 
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AAGCTCACCGGAGACGAAAACTTTACCATTCCCTATTGG^ 
gplOO #32 

LRLVKRQV P L DCVLYRYGS FSVTLDIVQG I 
CTGAGACTGGTCAAGAGACAGGTCCCCCTC^^ 

MUC1R #10 

F JL_? I Y K Q G G PL G US NIKFRPGSVVVQLTLA 
TTCCTCO^TTTACAAACAGGGAGGCTTTCT^ 

MC1R #9 

VIDVITCSSMLSSLCFLGAIAVDRYISIFY 
GTGATrGAGGTCATCACATGCTCCAGCATGCTGTCQ^ 

Tyros #21 

RHPGNHDKSRTPRLPSSADVEFCLSLTQYE 
AGGAATCCOSGAAACCATGACAAAAGCAGA^ 

TRP-1 #14 

FDBWLRRYNADISTFPLBNAPIGHNRQYNM 
TTCGATGAGTGGCTGAGAAGGTATAACGCTGACATTAGCA 

gplOO #39 

V S LADTNS LAVVSTQLI M PGQEAGLGQVP h 
GTGTCCCTGGCTCACACAAACTXXXTGGCT 

gplOO #20 

G P V TAQVVLQAAIPLTSCGSSPVPGTTDGH 
GGCCCTGTGACAGCCCAAGTGGTCCTGCAAGCCGCTA 

Tyros #8 

K F Q P W GPNCTERRLLVRRNI FDLSAPEKDK 
AAGTTTGGCTTTTGGGGACCCAATTGCACAGAGAGAA 

gplOO #13 

L G T__ H TMEVTVYHRRGSRSYVPLAHSSSAFT 
CTGGGAACCCATACCAlGGAGGTCACanCT 

MC1R #12 

A V A A I H V A S V V F S TLF1AYYDHVAVLLCLV 
GCCGTCGCeGCTATCTGGGTGGCTAGre 

TRP2 #25 

GTLDSQVMSLHNLVHSFLNGTNALPHSAAN 
GGCACACTGGATAGCCAAGTGATGAGCCTCCAC^ 

HART #4 

G CWY CRRRNGYRALMDKSLHVGTQCAI.TRR 
GGCTGTTGGTATTGCAGAAGGAGAAACGGATACAGAGCCCTCA 

Tyros #15 

P W H R L F L LRWEQEIQKLTGDENFTI PYWDW 
CCCTGGCACAGACTGTTTCTGCTCAGGTG^ 

MC1R #1 

A A M AVQ GSQRRLLGSIiNSTPTAZ PQLGLAA 
GCCGCTATGGCTGTGCAAGGCTCCCAGAGAAGGCTCC^ 

MC1R #5 

VVATIAKNRHLHS PMYCFI CCLALSDLLVS 
GTGGTCGCCACAATCGCTAAGAATAGGAATCTGC^ 

Tyros #25 

Q S S M H N A L H I Y HNGTHSQVQGSANDPI F I* I* 
CAGTCCAGCATGCACAATGCCCTCCACATTTACATGAAOGGA 

Tyros #1B 

C Q HPT H P M LLSPASFFSSHQIVCSRLBEYN 
GGCCAACACCCTACCAATCCCAATCTGCTCAG 

HC1R #6 

VCFICCLALSDLLVSGTNVLETAVILLLEA 
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TACTCTTTCATTTGCTGTCTGGCTC 
TRP2 #19 

° * 7 L 1 S RNSRFSSW BTVCDSLDDYNHLVTL 
GACCCTACXCTCATCTCCAGGAATJUX^GATTCTCC^ 

MUC1F #8 

TRPALGSTTPPAHDVTSAPDNKAA 
ACCAGAaXXXrrCTGGGAAGCACAACCCCTCCCG^ 

Tynros #17 

R ° A E K __ C __ D ICTDBYMGGQHPTNPNLLS PASF 
AGG^TGCCGAAAAGTGTGACOTTTGCACAGAC^ 

gplOO #17 

TP p A LQLHDPSGYLABADLSYTWDFGDSSGT 
ACCTTTGCCCTCC»GCItX»CGATCC^ 

Tyros #22 

SSADV BFCLSLTQYESGSMDKAANFSFRNT 
AGCTCCGCCGATGTGGAATTCTGTCTGTCCCTGACACAGTAT^ 

gplOO #6 

GPTLIGANASFSIALNFPGSQKVLPDGQVI 
GGCCCTACXTCTCATCGGAGCCAATGCCrcCTTCTCCATC^ 

MC1R #18 

W G P F F **HLTLIVLCPEHPTCGC I FKNFNLF 
TGGGGACCXTITTTTCCTCCACCra 

Tyros #7 

C Q C S G N F H G F NCGNCKFGFWGPNCTERRLL 
TGOCAATGCTCCGGCAATTTCATGGGCTTTAACTGTGGCAATTG 

TRP2 #34 

QYRRLRKGYTPLMBTHLSSKRYTBBAAA 
CAGTATAGGAGACTGAGAAAGGGATACACACCCCTCATGGAAACCCATCTG^ 

TRP-1 #15 

PLBNAPIGHNRQYNMVPFWPPVTNTBMPVT 
CCCCTCGACAATGCCCCTATCQGACACAATAGGC^ 

gplOO #7 

w p P G S Q KVLPDGQVIWVWMT I.I WGSQVWGG 
AACTTTCCOGGAAGCCAAAAGGTCCTGCCTGACGGACAGGTCATC 

gplOO #22 

RPTABAPNTTAGQVPTTBVVGTTPGQAPTA 
AGGCCTACCGCTGAGGCTCCCAATACCACAGCCGGACAG^^ 

MUC1F #3 

STPGGEKBTSATQRSSVP5STEKNAVSMTS 
AGCACACCCGGAGGCGAAAAGGAAACCTCaXTJ^ 

gplOO #42 

LIYRRRLMKQ D F SVPQLPHSSSHWLRLPRI 
CTGATTTACAGAAGGAGACTGATGAAGOUVGACTT^ 

TRP2 #12 

Jyg L J* g P ** G T Q P Q FANCSVYD PPVHLHYYSV 

CrGGGACTGCTOGGCCCTAAOGGAACCCAACCOCAATICGCTA^ 1 1 TL l TltjTCrJXJG CTGCATTACTATAGCGTC 

TRP-1 #9 

CLBVGLPDTPPFYSNSTMSFRNTVBGYSOP 
TGCCTaSAGGTCXSGCCTCTTCGATA ^ 

gplOO #1 

A A M D L V L KRCLLHLAVI GALLAVGATRVPR 
GCCGCTATGGATCTGGTCCTGAAAAGGTGTCTGCTCX^^ 

MC1R #3 

HQTGARCLBVSISDGLFLSLGLVSLVBMAL 
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AACCAAACCGGAGCCAGATGCCT05AGGTCAGCATTAGCGATGGCCTCTTCCT 
Tyros §23 

SGSMDKAA W F S F R NTLB6 FASPLTGIADAS 
AGCGGAAGCATGGACAAAGCCGCTAACTTTAGCTTTAGGAAT^ 

Tyros #4 

SPCGQLSGRGSCQNILLSNAPLGPQFP FTG 
AGCCCTTGCGGACAGCTCAGCGGAAGGGGAAGCItn^ 

Tyros #13 

M H Y Y V S MDALLGGSEIWRDIDFAHBAPAFL 
ATGCTVTTACI^TGTGTCCATGGATGCCCTCCTGGGAGGCT 

Tyros #35 

BBKQPLLMEKEDYHSLYQSH L A A 
GAGGAAAAGCAACCCCTCCTGATGGAGAAAGAGGATTAO» 

TRP2 #5 

GQCTEVRADTRPWSGPYI LRKQDDRBLWPR 
GGCCAATGCACAGAGGTCAGGGCTGACACAAGGCCTTGGTC^ 

MUC1F #4 

SVPSSTBKNAVSMTSSVLSSHSPGSGSSTT 
AGCGTCCCCTCCAGCACAGAGAAAAACXXrnn^ 

Tyros #12 

TPMFNDINIYDLFVWMHYYVSMDALLGGSB 
ACCCCTATGTTTAACGATATCAATATCTATGACCTCT^ 

gplOO #9 

QPVYPQETDD'ACIFPDGG PC PSGSWSQKRS 
CAGCCTGTGTATCCCCAAGAGACAGACGATGCCTGTAT^^ 

TRP-1 #6 

DSLBDYDTLGTLCNSTEDGP IRRNPAGNVA 
GACTCCCTtXSAAGACTATGACACACTGGGAAC^ 

gplOO #8 

WVNNTI IHGSQVWGGQPVYPQBTDDAC I FP 
TGGGTCAACAATACCATTATCAATGGCTCCCAGGTCTGGGGAGGCCAA^ 

MART #7 

QEKNCBPVVPNAPPAYEKLSAEQSPPPYSP 
CAGGAAAAGAATTGCGAACCCG'iXX^lt^CCTAACGC^^ 

gplOO #14 

SRSYVPLAHSSSAFTITDQVPFSVSVSQI*R 
AGCAGAA gC T A TOl ^ tritrrt ^ 

TRP-1 #2 

LBKDMQEMLQBPSFSLPYWN FATGKNVCDI 
CTGGAAAAGGATATGCAAGAGATGCTGCAAGAGCCTAGCTTT^ 

TRP-1 #16 

VPFWPPVTHTBMFVTAPDNLGYTYBAA 
GTGCCTTTCTGGCCCCCTt7TGACAA 

TRP2 #13 

CSVYDFFVtfLHY YSVRDT I> L G P G R P Y R A ZD 
TGCTCCGTGTATGACTTTTTCGTCTGGCT^ 

Tyros #9 

VRRNI FDLSAPBKDKFFAYLTLAKHTI SSD 
GTGAGAAGGAATATCTTTGACCTCAGCGCTCCCGAA 

MART #2 

KKGRGHSYTTABBAAGIGILTVI h G V L L L I 
AAGAAAGGCCATGGCCATAGCTATACCACAGCCGAAGA 

gplOO #11 

FVYVHK THGQYWQVLGGPVSGLS I GTGRAM 
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TTCGTCTACGTCTGGAAAACCTG^ 
gplOO #12 

G G P V SGLSIGTGRAML6THTMKVTVYHRRG 
GGCGGACCCGTCAGCGGACTtnXXATCGGAACCGGAAG 

gplOO #25 

ISTAPVQMPTABSTGMTPBKVPVSEVMGTT 
ATCTCCACCGCTCCCXrrCCAGATGCCCACAGCCGAAAGCA 



Tyros #19 

P S S W 0 I V C SRLEEYNSHQSLCNGTPEGPLR 
TTCTCCAGCTGGCAGATTGTGTGTAGCAGACTGGAAGAGTATAA 

TRP2 #27 

DP1PVVLHSPTDAIPDBWMKRPNPPADAWP 

GACCCTATCTTTGTGGTCCTGCATAGCTTTACCGA 

MC1R #15 

H M h A R AC QHAQGIARLHKRQRPVHQGPGLK 
CACATGCTGGCTAGGGCTTGCCAACAOGCTCAGGGAATCGCTAGG 

MUC1F #2 

LLTVLTVVTGSGHASSTPGGEKBTSATQR S 
CTGCTCACCGTCCTGACAGTGGTCACCGGAAGCGGACA 

gplOO #44 

FCSCPIGBNS PLLSGQQVAA 
TTCTGTAGCTGTCCCATTGGCGAAAACTCCCCC^ 

TRP2 #24 

T F S F RNALBGFDKADGTLDSQVMSLHNLVH 
ACCTTTAGCTTTAGGAATGCCCTCGAGGGATTCGATAAG(^ 

Tyros #20 

S H Q — S —- ° M G T P B GPLRRNPGNHDKSRTPRLP 
AGCCATCAGTCCCTGTGTAACGGAACXXCTGAGGGAC^^ 

TRP2 #30 

P F g PFV THBBLFLTSDQLGYSYAIDLPVSV 
CCCTTTTTCCCTCCCGTCACCAATGAGGAACltil 

TRP2 #9 

BRKKPPVIRQNIHSLSPQBRBQFLGALDLA 
GAGAGAAAGAAACCCCCTGTGATTAGGCAAAACATTCA^ 

TRP2 #29 

QBLAPIGHHRMYNMVPPPPPVTNBBLFLTS 
CAGGAACTGGCTCCCATTGGCCATAACAGAATGTATAACAT^^ 



gplOO #28 

B V S I V V L S G TTAAQVTTTBWVBTTARBLPI 

GAGGTCAGCATTGTGGTCCTGTCCGGCACAACCGCT^ 

MUC1R #7 

TSPQLSTGVSFFFLSPHISNLQPHSSLBDP 
ACCTCCCCCCAACTGTCCACCGGAGTGTCCllVr 

MDC1R #19 

Y H T H G R Y V P P S STDRSPYBKVSAGNGGSSL 
TACCATACCCATGGCAGATACGTCCCCCCTAGCTCC^ 

MC1R #4 

LPLSLGLVSLVENALVVATIAKNRNLHSPM 
CTGTTTCTGTCCCTGGGACTGGTCAGCCTCG 

TRP2 #26 

SFLMGTHALPHSAANDPIPVVLHSPTDAIF 
AGCTTTCTGAATGGCACAAACGCTtrreCCTCA^ 

MUC1R #17 

AVCQ CRRKHYGQLDI PPARDTYHPHSBYPT 
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GCCCTCTGCCAATGCAGAAGGAAAAACTATG 
MC1R #14 

VPFLAMI.VLMAVLYVHMLARACQHAQGIAR 
GTGTTTTr CCTCGCCATGCTGGTCCTGATC 

TRP-1 #10 

STNSFRNTVEGYSDPTGKYDPAVRS LHNLA 
AGCACAAACTCCTTCAGAAACACAGTG 

TRP-1 #3 

LPYWNFATGKNVCDICTDDLMGSRSNPDST 
CTGCCrrACTGGAACTTTGCCACAGGCAAAAAOGTCTGCG^ 

VSQLRALDGGNKHPLRNQPL 
ATGGCGGAAACAAACACTTTCTGAGAAACCAACCCCTC 

MUC1R #8' 

FHISNLQFNSSLEDPSTDYYQBLQRDISBM 
TTCCATATCTCCAACCTCCAGTTTAACTCCAGCCTC 

MUC1R #20 

S PYBKVSAGNGGSSLSY 
AGCCCTTACGAAAAGGTCAGCGCTGGCAATGGCGGAAGC^^ 

Tyros #11 

YVI PIGTYGQMKNGSTPMFNDINIYDLFVW 
TOCGTC^TCCCTATCGGAACCTATGGCCAAATGAAAAACGGAA 

gpioo #37 

RLCQPVIiPS PACQItVLHQI LKGGSGTYCLN 
AGGCTCTGCCAACCCCTrCCTGCCTA^ 

gplOO #33 

R Y G S F s v TLDIVQGIBSABILQAVPSGBGD 
AGGTATGGCTCVriXTtXXrftaC!ACTGGATATOGT^ 

VDSIFEQWLQRHRPLQBVYPBANAPI 
ATAGCATTTTCGAACAGTGGCTGCAAAGGCATAGGCC^ 

TRP-1 #4 

CTDDLMGSRSNFDSTLISPNSVPSQWRVVC 
TGCACAGAQ^TCTGATGGGCTCCAGGTCC 

MUC1R #18 

F PARDTYHPMS BYPTYHTHGRYVPPSSTDR 
TTCCCTGCCAGAGACACATACCATCCCATGAGOT 



MC1R #19 

BHPTCGC1PKNFHLFLALI ICNAI IDPLIY 
GAGCATCCCACATGCGGATGCATTTTCAAAAACTTTA^ 

Tyros #26 

MSQVQGSANDPIFLLHHAFVDS I P E Q W L Q R 
ATGTCCCAGGTCCAGGGAAGGGCTAACGATCCCATTTTCCTCCTGC^ 

TRP2 #22 

RNSHKLPTLXDIRDCLSLQKFDNPPFP QNS 
AGGAATAGCaTGAAGCTCCCCACACTGAAAGAeATTAGGG A ^ 

gplOO #19 

LISRALVVTHTYL 
TRP2 #17 

SPALPYWHFATGRNBCDVCTDQLFGAARPD 
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AGCTTTGCCCTCCCCTATTGGAATTTCGCT 
gplOO #2 

VIGALLAVGATKVPRNQDWLGVSRQLRTKA 
GTGATTGCCGCTCTGCTCGCCGTCGGCXJCTACCAAA^ 

gplOO #16 

ALDGGNKHFLRNQPLTFALQLHDPSGYLAE 
GCCCTCGACGGAGGCAATAAGCATTTCCrCAGGAATCAGCCTC^ 

TRP2 #18 

C P V C TDQLFGAARPDDPTLISRNSRFSSWB 
TGCGATGTGTGTACCGATCAGCTCTTCGGAGCCGCTAGG 

HART #1 

AAMPRBDA H F IYGYPXKGHGHS YTTAEEAA 
GCCGCTATGCCTAGGGAAGACGCTCACTTTATCTATGGCTATCCCA^ 

TRP-1 #11 

T G KYDPAVRSLHNLAHLFLNGTGGQTHLSS 
ACCGGAAAGTATGACCCTGCCGTCAGGTCCCTGCATAACCTC^ 

MUC1R #14 

SDVSVSDVPFPFSAQSGAGVPGWGIALLVL 
AGCGATGTGTCCGTGTCCGACGTCCCCTTT^^ 

TRP2 #10 

SP Q E R B Q F LGALDLAKKRVHPDYVITTQHN 
AGCCCTCAGGAAAGGGAACAGTTTCTGGGAGCCC^ 

Tyros #10 

FFAYLTLAKHTISSDYVI PIGTYGQMKNGS 
TTCTTTGCCTATCTGACACTGGCTAAGCATACCATTAGC^ 

MC1R #7 

G T N V L BTAVI LLLEAGALVARAAVLQQLDN 
GGCACAAACGTCCTGOAAACCGCTGTGATTCTGCTCCTGG^ 

KUC1R #16 

VCVLVALAI VYLIALAVCQCRRKNYGQLD I 
GTGTGTGTGCTCGTGGCTCTGGCTATCC 

MART #6 

C P Q B G F D HRDSKVSLQEKNCEPVVPNAPPA 
TGC<XTCAGGAAGGCTTTGACCATACGGATAGCAAAGTGTCC^^ 

MUC1F #5 

SVLSSHSPGSGSSTTQGQDVTLA PATE PAS 




TRP2 #28 

DBWMKRFNPPADAWPQBLAPIGHNRMYN/IV 
GACSAATGGATGAAGAGATTCAATCCCCCTGCC^ 

KC1R #21 

AFHSQBLRRTLKEVLTCSWAA 
GCCTTFCACTCCCAGGAACTGAGAAGGACACTGAAAGAGG 

TRP2 #15 

* S H Q G P A F V T WHRYHLLCLBRDLQRLIGNB 
TTCTCCCACCAAGGCCCTGCCTITGTGACATC 

TRP-1 #8 

R P M V Q R L P B PQDVAQCLBVGLFDTP PFYSH 
AGGCCTATGGTCCAGAGACTGCCTGAGCCTCaGGAT^^ 

TRP-1 #13 

Q D P I F VLLHTFTDAVPDBWLRRYNADISTF 
CAGGATCCC A lTri tA ri ^ 

TRP2 #4 

LGAESAHVCGSQQGRGQCTBVRADTRPNSG 
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CTTXJGAGCCGAAAGCGCTAACGTCTGCGGAAGC^ 
TRP2 #8 

YNCGDCKFGWTGPNCERKKPPVI R Q N I H S L 

TACAATTGCGGAGACTGTAAGTTTGGCTGGACXXSGACra 

TRP-1 #12 

HLPLNGTGGQTHLSSQDPI FVLLHTFTDAV 
CACCTCTTCCTCAACGGAACCGGAGGCCAAACCCATC^ 

Tyros #34 

GLVSLLCRHKRKQLPEBKQPLLMEKEDYHS 
GGCCTCK3TCTCCCTGCTCTGCAGACACAAAAGG 

TRP2 #2 

GCKILPGAQGQFPRVCMTVDSLVNKBCCPR 
GGCTGTAAGATTCTGCCTGGCGCTCAGGGACAGTTTC^ 

gplOO #43 

QLPHSSSHWLRLPRIFCSCPIGBNS PLLSG 
CAGCTCCCCCATAGCTCCAGCCATTGGCTCftGGCTCCC^ 

gpioo #io 

DGGPCPSGSWSQKRSPVYVWKTWGQYWQVL 
GACGGAGGCCCTTGCCCTAGCXSGAAGCTGGAGCCAAA^^ 

gpioo #3 

NQDWLGVSRQLRTKAWNRQLYPEWTEAQRL 
AACCAAGACTGGCTGGGAGTGTCCAGGCAACTGAGAA 

Tyros #14 

IWRDI DFAHEAPAFLPWHRLFLLRW EQBI Q 
ATCTGGAGGGATATCGATTTOGCTCAGGAAGCCCCTGCCTTTCTtS^ 

MUC1F #1 

AAMTPGTQSPPFLLLLLTVLTVVTGSGHAS 
GCCGCTATGACACCCGGAACCCAAAGC CV r riVrr f C 

MART #5 

DRSLHVGTQCALTRRCPQBGFDHRDSKVSL 
GACAAAAGCCTCCACGTCGGCACACAGTGTGCCCTCA 

MUC1R #2 

NVTSASGSASGSASTLVHNGTSARATTTPA 
AACGTCACCTCCGCXrrcaSGCTCCGCC^^ 

Tyros #24 

LEG FASPLTGIADASQSSMHN ALHI YMNGT 
CTGGAAGGCTTTGCCTCCCCCCTCACCXSG^ 

TRP2 #14 

RDTLLGPGRPYRAIDPSHQG PAFVTWHRYH 
AGGGATACCCT(XTGGGACCCGGAAGGCCTTACAGAGCCATrc 

Tyros #1 

AAMLLAVLYCLLWSFQTSAGH FPRACVSSK 
GCXlGCTATGCTCCTCGCTGT GC I'CrACTCirJt X ^ 

gplOO #35 

AFBLTVSCQGGLPKBACMEIS SPGCQP P A Q 
GCCTTTGAGCTCACCGTCAGCTGTCAGG^ 

Tyros #6 

VDDRBSN PSVFYHRTCQCSGN PMGFNCGNC 
GTGGATGACAGAGAGTCCTGGCCTAGCGTCTTCTATAAC^ 

gplOO #34 

BSABILQAVPSGBGDAFBLTVSCQGGLPKB 
GAGTCCGCCGAAATtXTCCAGGCTGTGCCTAGCGGAG 

TRP2 #20 

TVCOSLODYNHLVTLCNGTYBGLLRRHQMG 
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ACCGTCTGCGATAQCCTCGAC^ 
Tyros #5 

LLSNAPLGPQFPPTGVDDRBSWPSVFYNRT 
CTGCTOVGCAATGCCCCrCTGGGACCCC^ 

HART 88 

YBKLSABQSPPPYSPAA 
gplOO #41 

IVGILLVLMAVVLASLIYRRRLMKQDFSVP 
ATCXyrCGGCATTCTGCrCGTGCTCATGGCTGT^^ 

MART #3 

GIGILTVI LGVLLLIGCWYCRRRNGYRALM 
GGCATTGGCATTCTGACAGTGATTCTGGGAGTGCT^ 

Tyros #31 

YSYLQDSD PDS PQDYIKSYLBQASRIWSWL 
TACTCXn^CCTCCAGGATAGCGATCCCGATAGCTTTCA^ 

MUC1F #6 

QGQDVTLAPATBP ASGSAATWGQDVTSVPV 
CAGGGACAGGATGTGACACTGGCTCCCGCTACCGAACCCGCT 

gplOO #21 

TSCGSS PVPGTTDGHRPTABAPNTTAGQVP 

ACCTCCTGCGGAAGCTCCCCCGTCCCXXK^CCACAGACX^ 

MUC1R #3 

LVHNGTSARATTTPAS KSTPFS I PSHHSDT 
TRP2 #32 

BBTPGWPTTLLVVMGTLVAIiVGLPVLLAPL 
GAGGAAACCCCTGGCTGGCCCACAACCCTCCTGGTCG 

gplOO #29 

TTTEWVETTARBLPIPEPBGPDASSIMSTB 
ACCACAACCGAATGGGTCGAGACAACCGCTAGGGAACTXX^ 

HC1R #17 

GAVTLT I LL6I FFLCWGP P F I* H I* T LIVLCP 
GGOGCTGTGACACTGACAATCCTOCTGGGAATCTTTT^ 

Tyros #33 
LGAAMVGA 

CTGGGAGCCGCTAT 

MC1R #8 

GALVARAAVLQQLDNV X DVI TCS SMLSS L C 
GGCGCTCTGGTCGCCAGAGCCGCTGTXKTrCCAGCAACT 

gplOO #26 

MTPBKVPVSBVHGTTLABNSTPBATGMTPA 
ATGACACCCGAAAAGGTCCCCtnCAGCGAAGTGATGGGCACAACCCTCGC^ 

Tyros #2 

QTSAGH PPRACVSSKNLMEKBCC P PHSGDR 
CAGACAAGCGCTGGCCAT1TCCCTAGGGCTTGCGTCAGCTCCAAG 

MdR #11 

ALRYHS IVTLPRAPRAVAAIWVA SVVFSTL 
GCCCIt^3GTATCACTCCATCGTCACCCTCCCCA 1 1 C 1 CCACCCTC 

KUC1R #12 

PRBGTINVHDVBTQPNQYKTBAAS RYNLTI 
TTCAGAGAGGGAACXATTAACGTCCACGATGTGGAA 

Tyros #3 

NLMBKECCPPWSGDRS PCGQLSGRGSCQNI 
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AACCTCATGGAAAAGGAAT 
Tyros #32 

IKSYLEQASRINSWLLGAAMVGAVLTALLA 
ATCAAAAGCTATCTGGAACAGGCTAGCAGAATCTGGAGCTGGCTGC^ 

MUC1R #5 

PTTLASHSTKTDASSTHHSSVPPLTSSNHS 
CCCACAACCCTCGCCTCCCACTCCACCAAAACCGATGCCT^^ 

MUC1R #15 

SGAGVPGWGIALLVLVCVLVALAIVYLIAL 
AGCGGAGCCGGAGTGCCTGGCTGGGGCATTGCCCT^ 

MC1R #10 

PLGA1AVDRY ISIFYALRYHS IVTLPRAPR 
TTCCTCGGCGCTATOGCTGTGGATAGGTATATCTC^ 

gplOO #40 

I» I M P G QBAGLGQVPLIVG I LLVLMAVVLAS 
CTGATTATGCCTGGCCAAGAGGCTGGCCTCGGCCAAGT^^ 

TRP2 #33 

TLVALVGLFVLLAPLQYRRLRKGYTPLMBT 
ACCXriTOlGGCTCrGGTCGGCCTC^^^ 

TRP-1 #5 

L I S P N S V F SQWRVVCDSLEDYDTLGTLCNS 
CTCATTAGCCXrrAACTCCGTGTTTAGCC 

MC1R #2 

LNSTPTA I PQLGLAANQTGARCLEVSI SDG 
CTGAATAGCACACCCACAGCCATTCCCCAACriG 

Tyros #28 

HRPLQBVYPBANAPIGHNRBSYMVPFIPLY 
CACAGACCCCTCCAGGAAGTGTATCCCGAAGCCAATGCCCCT^ 

gplOO #24 

BPSGTTSVQV PTTBVI S T A P . V Q M PTABS TG 
GAGCCTAGCGGAACCACAAGCGTCCAGGTCCCCACAAC^^ 

TRP2 #11 

K K R V H P^ D YVITTQHKLGLLGPNGTQPQFAN 
AAGAAAAGGGTCCACCCTCACTATGTGATTACCA 

gplOO #38 

LHQILKGGSGTYCLNVSLADTNS LAVVSTQ 
CTGCATCAGATTCTGAAAGGOGGAAGCGGAAOCTATT GC CT CA I^ 

gplOO #30 

PBPBGPDASSIMSTESITGSLGPLLDGTAT 
CCCGAACCCGAAGGCCCTGACGCTAGCTCCATC^ 

gplOO #31 

SITGSLGPLLDGTATLRLVKRQV PLDCVLY 

gpioo #5 

D C W RGGQVSLKVSNDGPTLI GANASFSIAL 
GACTGTTGGAGAGGCGGACAGGTCAGCCTCAAGGTC 

Synthetic Protein: 



WH1K31>YPEHTKAQRlIX>rRGGQVSLJCVSNDPYI IJU*C^REIJ*PRXPPHRTOCCTCHPAGRNGDPF I SS KDLGYD YSYLQDSD PDS PQDYAAP AFLTW 
HRYHLLRLBKDMQEMLQBPS FSGHNRBSYMVPFI PLYRNGDFFI SSKDLGYDIXCLBRDLQRLIGNBSFALPYWHFATGRNBTTBV^ 
PSGTT SVQV PTTBVSTD YYQBI^RDISBMFIiQI YKQGGFLGLSWACMBI SS PGCQP P AQRLCQPVLPSPACQLVDQLGYSYAI DLPVSVEBTPGWPTT 
LLWMGTEDG P I RRN P AGNVAR PMVQRLPB PQrn^OOm^S L VNKECCPRI>GAES ANVCGSQCPRKQ Y KTEAAS RYNLTI SD V SVSDVP F P PSAQAA 
MSPlJ?WG FLLSC LGCiaLPGA(^F PRVM l^^ 

QLTIAFRBGTIHVHDVBTQFGSAATWGQPVTSVP^ FFLCIAIiIICNAIIDPLIYAFH 

SQBLRRTlJagVLKFFWfCXCl-GWFAGYBUGI^^ 

DASSAAWRPAIfiSTAPPVHHVTSASGSASGSASTQIGT^ 



Figure 27 (Cont) 



WO 01/090197 



PCT/AU01/00622 



180/216 

YDHVAVLLCLVVFFIAMLVLMAVLYVKLTGD 

l^IKPRPGSVVVQLTIAVIDVITCSSMLSSLCFUSAIAVDRYISIPyiWP 

PIGHNRCflfNMVSIJUDTNSIAWSTQ^ 

DJU^mtEVTVYHIWGSRSYVPIJUJSSSAFTAVAAIW 

RRNGYRALMDXSLHVGTQCALTRRPWHRI^LLR PYWDWAAMAVQGSQRRLLGSLNSTPTAI PQLGLAA WATIAKNRNLHSP 

MYCFICCTJUCTI^VSQSSMHNALHIYW*^^ 

AVILIXEADPTLISIWSRPSSNBTVCDSIiDDYNHLVTLTRPAIjGST^ 

HDPSGYIABADI£Ynrora)SSGTSSADVEFtXSI*TQYBSGSMDKAA» 

PEHPTCGCIPKNFNLPCQCSC3JFHGFNa^CKFGPW 

TEMFVTNPPGSQKVLPDGQVIWVNNTIINGSQVWGGRWABAPiOT 

YRRRLMKQDFSVPQLPHSSSHWIJ*LPRIIjGI^PNGTQPQFANCSTO 

IXHLAV1GAIXAVGATKVPRHQTGARCLBVS ISDGLFLSL6LVSLVENALSGSMDKAANPSPRNTLEGFAS PLTGIADASSPCGQLSGRGSCQNILLS 

RAPI/3PQFPFTGMHYYVSMDALIXX;SEIWRDI 

SSTEKHAVSMTSSVI^SHSPGSGSSTTTPMFNDINIYPI^^ 

U^STBIXSPIRJWPAGlWWVlOrrilNGSQVWGG^ 

VPFSVSVSQIJlLBia)MQBMLQBPSFSIJnfWNPATGKNVCDIVPFWP 

VRRN I FDLSAPBKDKF FA YLTLAKHT I S SDKKGHGHS YTTAE EAAG I G I LTVI IX3VLLLI FVYVWKTWGQYWQ VIjGGPVSGLS I GTGRAMGG PVSGLS 
IGTGRAMIXnwmBVTVyHRRGISTAPVQMPTAKSTGKITBKV^ FWLHSFTDAI PD 

BWMKR FN P P ADA W PHMLARACQHAQG I AR1jHKR(^PVHQGFGIJCLLTVLTVVTG SGHASST PGG EKETSATQRS FCSCPI GEN S PLLSGQQVAATPS F 
RNAI^FDKAIXrrLDSQVMSIJiNLVHSHQSLCNGTP 

HSLSPQBREQPI/SALDIJ^BIjAP IGHNRMYNMV PQLSTGVSFFFLS FH ISH 

LQFNSSLED PYHTHGR YVPPSSTDRS PYEKVSAGl^SSLI^l^LGLVSLVENALVVATIAKNRNLHS PMS FLNGTNALPHSAAND P I FWLHSFTDA 
I FAVCQCRR KNYG QLDI FPARDTYHPMSBYPTVFFIAMLVIJIAVLYVHMLARAC^ 
ATGXNVCDICTDDLMGSRSNFDSTITDQVPFSVSVSQLRAIiDGG)^ 
LSYTNPAVAAASANIJmtt PIGTYGQMKKOT 

AVPSGEGDHHAFVDS I FEQWLQRHR PI^EVYPBANAPI CTDDLMGSRSNFDSTLX S PNSVFSQ^VVCFPARinTHPMSBYPTYHTHGR YVPPSSTDR 
SYTN PA VAAASANLAAEHPTCGCI FKNFNItPLALI I CNAI I DPLI YMSQVQGSANDPI FLLHHAFVDS I FEQWLQRRNSMKLPTLKD I RDCLSLQKPD 
NPPFFQNSLI SRALWTHTYLE PG PVTAQWLQAAI PLSFAliPYWNFATGRNFXJDVCTDQLFGAARPDVI GALLAVGATKVPRNQDWIjGVSRQLRTKA 

aldgghxhfzjwqpltfai^lhdpsgyiaecdvctdqlfg 

RSLHOTiAHLFUIGTGGQTHI^SDVSVSDVPPPFSAOSGAGVPGTire 

^PIGTYGQMKNGSGTNVLBTAVILLLBAGAljVARAAVIiQQLD 

PNAPPASV1>SSHSPGSGSSTTQGQDVTIAPATEPASDB 

HRYHLLCS^RDLQRlJGNBRPMVQRiPBPQD^ FVLLHTPTDAVFDEWLRRYNADI STFIiGAESANVCGSQQGRGGCT 

F^HlADTRPWSGYNCGIXnCFGVrePNCERKKPPVIRQNI^ 

YHSGCKILPGA(X^FPRVQfrVDSLVNKECCPRQLPHSSSHWIJll>PRIFC^ 

GVSRQLRTKAWNRQLYPEWTEAQRIjIWRDIDFAHEAPAFLP*^^ 

RRCPQBGFIMRDSKVSUfVTSASGSASGSAST^^ 

AFVTWHRYHAAMMAVLYCIiWSFQTSAGHFPRACVSSKAFBXT^ 

CESABILQAVPSGEGDAFELTVSCQGGLPKETVCDSIjDDYN}^ 

QS PPPYSPAAIVGIIiVIJttVVIJ^IJ YRRJU^QPFSVTC^ KSYXEQASRIWS 
WLQG QDVTIiAPATBPASGSAATWGQPVTSVPVTSOGSSPVPGTTO 

PTTLLVVMGTLVALVGLFVZiLAPLTTTBWVBTTARE^ PBPEGPDASS IMSTEGA VTLTI LLGI PFlO^PFPI^TLIVI^PLGAAMVGAVLTAlJi 

AGLVSU^CRHKRKQIJ*3AI,VARAAVU^^ 

CPPWSG0RAUlYHSIVTM»RAPRAVAAIWVASVVFSTI*FRE^ 

I KSYXEQASRI WSWLIXSAAMVGAVLTALIAPTT^ VYLI ALFLGAIAVD 

RYISIFYALRYHSIVTI^RAPRMMPGQBAGIXX]VPLIW^ 

SLKDYDT1X5TLQISUJSTPTAI PQI^IJ^IHT'CARCLBVS I5ZIGHR PLQBVYPEANAPIGKNRBSYMVPPI PLYBPSGTTSVQVPTTBVI STAFVQMP 
TAF^HXiaCKRVH PD YVITTQHWLGLLG FNGTQP^ PEGPDASS IMSTES ITGSLGPLLDGTATS I 

TGSLGPLU)GTATLRLVXRQVPU>CVLYI)C^ 

Synthetic DNA: 



TGGAATAGGCAACTGTATCCCGAATGGACAGAGGCTCAGA CGATCCCTATAT 

CCTCAGGAATCAGGATGACAGAGAGCTCTGGCCTAGGAAATIV 

TTATCTCCAGCAAAGACCTCOGCTATGACT^ 

CACAGATACCATCTGCrCAGGCrCGAGAAAGACATGCAGGAA^ frf 
CATTCCCCTCrACAGAAAOGGAGAL^'rrrrtlATTAGCTC^ 

KT&tTr^r^^i^iMrriACTCGM^i i i\,QCACAGGCAGAAACGAAACCACAGAGGTCGTGGGA 

CCCTCCGGOVCAACCTCOGTGCAAGTGCCTACt^^ 

CTATAAGCAAGGCGGATTCCTCGGCCTCAGCAATGCCTGTATGGAA^ 

TCCCCTCCCCCGCTTCCCAA^ 

CTGCTCGTGGTCATGGGAACCG^ 

CGTCGCCCAATGCATGACCGTCGACTCCCTG 

GAAACCAATACAAAAttX3AAGOCGCTAGCAGATACA^ 

ATGTCCCC CCTCray rGGGGCTTTCT^^ 

CACATCGGATTTCGGAGACTtXAGCGGAACCCK^TCT 

ATCAATGTGCATGAOGTOGAGACACAGTTTCGCTCC G COGCTACCTG 
GCCTGTGACAAGGCCTGCCCTCG GC rCC A CCACACCCCC^ 

GAGCCGTCACCCTCACCATTCTGCTCGGC A1 1 1 1CI rrCTGTGTCTGGCTCTGATTATt^GTAACGCT 
AGCCAAGAGCTCAaSAGAACCCTCAAGGA^ 

CAAATTCGGATGGACAGGCCCTAACTGTCrGTCCCTGCAAAAG^ 1 lVriTCA GAATAGCAC A Tni"l\X"re 
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AACX^TTTGACAAAGCCG^ 

GACGCTAGCTCCGCCGCTAACAGACCCGCTCTGGGAAGCACftGCCCCTC 

CACATGCAATGC^CATACGAAGGCCTCCTGAGAAGGA^ 

ATCACTCCAGCGTCCCCCCTCTGACAAGCTCC^ 

TACG ATCACGTCGCCGTCCTGCTCTGCCTCGUt^TC^^^ 

CTTTACCATTCCCTATTGGGATTGGAGAG^ 

TCGACTGTGTGCTCTACAGATACGGAA<^^ 

CTGTCCAAC ATTAAGTTTAGGCCTGGCTCCGTGt^ 

CTTTCTGGGA GCCATTX KrCGTCtlACAGATAOVTTAG 

CTGACGTCGAGTITTGCCTCAGCCTCACCCAATACGA^ 

CCCATTGGCCATAACAGACAGTATAACATG6TGTCCCTGGCTGACACA 

CGGACTGGGACAGGTCCCCCTCGGCCCTGTGACAGC(X^ 

CAACCGATGGCCATAAGTTTGGCTTTTGGGGACCCAATTGCACAGA 

GACAAACTGGGAACTCATACCATGGAGGTCACXGTCTACXATAGG 

CGTCGCCGCTATCTGGGTGtXrrAGCGTCG^ 

ATAGCCAAGTGATGAGCCTCCACAATCTGGTCCACT^ 

A<XyVGAAACGGATACAGAGCCCTCATGGATAAGTCCCTG^ 

gTGCGAGCAAGAGATTCAGAAACTGAC^GGCG^ 

TCCTGGt^AGCCTCAACTCCACXXXrrACCGCTATCCCTCA 

Altn^TTGCTTTATCTGTTGCCT 

CCAAGTGCAAGGCTCCGCCAATGACCCTATCTT^^ 

^TCGTCTGCT^ 

GCCCTCATCC^^ 

CCATCT^CACCCTCACCAG^^ 

AAAAGTGTGACATTTGCAOVGACGAATACATGGGCGGA^ 

CACGATCCCTCCGGCraTCTGGCTCAGGCTGA 

GTCCCTGACACAGTATGAGTCC^^ 

CCATCGCTCTGAATTTCCCTGGCTCrc 

CCCGAACACCCTACCTGTGGCTGTATCTTTAAG^ 

CGGATTCTGGGGCCCTAACItn-ACCGAAAGGAGA^ 

AAAGGTATACCGAAGAGGCTGCCGTO 

ACCGAAATGTITGTGACAAACTTTCCCGGAAGC^^ 

GTGGGGCGGAAGGCCTACCGCTGAGGCTCCCAATA^^ 

cra ^C ftC CCGGAG<XXyiAAAGGAA AOT 

TACaGAAGGAGACTGATGAAGCAAGACTTTAGCGTCCCCC^ 

CCCTAACGG^^ 

ATACCCCITXXriTTTACTCCAACTCCACCA^ 

CTGCTCCACCTCGCra 

C ^ TG GCCTCITCCTCAGCCTCGG^ 

TCGAGGGATTCGCTAGCCCTCTGACAGGC^Tra 

AACGCTCCCCTC^GCCCTCAGTTIXX^^ 

CTTTGCCCATGAGGCTCCCGCTr^^ 

GCCAATGCACAGAGGTCAGGtXTItiAC^CAAG^ 

TCCAGCACAGAGAAAAACGCTGTGTCCATGACAAGC^^ 

TATCAAXAT^^ 

ACGATGCCTGTATCI I i^uvATGGCGGACCCTCTCCCTCCGGCTCCT^ 

CTCTGCAATAGCACAGAGGATGGCCCTATCAGAAGG^^ 

AGGCCAACCCXnCTACCXrrCAGGAAACCGATGACSCTT^ ^ 

AGAAACTGTCTG<X3GAACAG^^ 

CnXrCCCTTTAGCGTCAGCGTCAGCCAA 

TACCGGAAAGAA'WnCTGl^ 

AGGCTGCCTCCT CCGTGrrA TGA Cri 'l"l' ltXri ^^ 

<nX3AGAAGGAATATCTTTGACCTCAGC^ 

C^TGGCCAT^ 

AAACCTGGGGCCAATACT^ 

A *^GGAACCGGAAGGGCTATGCT<XiGCACA^ 

eXSAA AGCACA GGCATGACCCCTGAGAAAGTCCC^ 

ATAACTCCCACCAAAGCCTCTGCAATGGCACA^ 

GAGTGGATCAAAAGGTTTAACCCTCCCGCrGAC^ 

GCAAAGGCCTGTGCATCAGGGATTCGGACTtyu^VCT 

AAGAGACAAGCGCrACCCAAAGGTCCTTCTGTAGCTG^ 

AGGAATGCCCTCGAGGGATTCGATAAOXTGACGGAACC^^ 

CGGAACCCCTGAGGGACCCCTOWX5AGA 

AACTGTTTCTGACAAGCGATCAGCT 

CA CTCCCIGT CCCCCCAAGAGAGACAGCAAT^^ 

GU/mViTfCtXtXTCltiACAAACGA^ 

CAGAGTQGGTGGAAACCACAGCO^GAGAGCTCCCCATTA^ 

CTGCAATTCAATAGCTCCCTGGAAGACCCTTACXrATAC^ 

OGGAAACGGAGGCTCCAGCCTCCTXn^^ 

TCCACTCCCCX^TGAGCTTTXnT^TGGCACAAAC^ 

ATCTTTGCCGTCTGCCAATGCAGAAGGAAA^ 

G'ATl'rTLVAtXJCCATGCTCGTCCTGATGGCCGTCCTC 
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CCTTCAGAAACACAGTGGAAGGCT^^ 
GCCACftGGCAA^ 

GTCCGTGTCCCAGCTCAGGGCTCTGGATGGCGGAAACAAACAC^ 

TCGA<^TCCCTCCACC^ 

CTGTCCTACACAAACCCTGCCGTCGCCOT 

CATGTTCAATGACATTAACATTrACGATCTGTTTGTGTGGA^ 

TCAAGGGAGGCTCCGGCAC^TACTGTCT^^ 

<KTCGTCCCCTCCGGCGAAGGCGATCACCATGCCTTTC 

GGCTAACGCTCC^ 

TCGTGTGTTTCCCTGCCAGAGACACATACCATCCCATGAGC^ 

AGCTATACCAATCCOGCTGTGGCTGCCGCTAGCGCTAACCTCGCOT 

CCTCATCATTTGCAATGCCATTATCGATCCCCT 

TCGACTCCATCTTTGAGCAATGGCTCCAGAGAAC^ 

AACCCTCCCTTTTTCCAAAACTCC^^ 

GGCTGCCATTCCCCTCAGCTTTGCCCTCCXXrrATTGGAA 

GACCCGATGTGATTGGCGCTCTGCTCGCCGTCGG 

GCCCTCGACGGAGGCAATAAGCATTTCCTCAGGAATCAGCC^^ 

GTGrrACCGATCAGCTCTTCGGAGCCGCTAGGCCTGAC^ 

AAC^CGCTCACTTTATCTATGGCTATCCCAAAAA 

AGGTCCCTC^TAACCTCXSCCCATCTGTITC^ 

CTTTAGCGCTCAGTCOGGCGCTCGCGTCX^ 

TCGCX3kAAAAGAGAGTGCATCCCGATTACGTCATCA^ 

GTGATTCCCATTOGCACATACGGACAGATGAA<^ 

GGCTAGGGCTGCCGTCCTGCAACAGCTCGACAATCT^^ 

GAAAGAATTACGGACAGCTCGACATTTGCCCTCAGGAA^ 

C CCRA TGCCCCTCCCGCTAGOGTCCTCT^^ 

GCCTGCCTCCGACGAATGGATGAAGA<^T 

'IXAX'Cri M ltJACTCCCAGGAACTGAGAAGGACACTGAAAGAG(^^ 
CATAGGTATCACC^^ 

TCAGTGTCTGGAAGTGGGACTGTTTGACACACCCCC^ 

ACGAATGGCTCAGGAGATACAATGCCGATATCTCCACCTTT 

GAAGTGAGAGCCGATACCAGACCCTGGAGCGGATACAATTGC^ 

CATCAGACAGAATATCCATAGCCTXXACCTCTTCCTC^ 

CCTTTACCGATGCCGTCCGCCTCG TGTCC ^ 

TATCACTCCGGCTGTAAGATTCTGCCTGGCGCTCAGGG^ 

AC AGCT CCCC^TAGCTCCAGCCATTGGCTCAGGCTCCCCA 

GCCCTTGCCCTAGCGGAAGCTGGACC^ 

GGACrGTCCAGGCAAXrTCAGAACCAAAGOCTGGAACAGACAC^^ 

T CACG AA<XX:CCTGCCTTTCTGCCTTGGCAT^^ 

'1C'14TCA^CTCCTGCTC^ 

AGAAGtnXSTCCCCAAGAGGGATTOGATCACAGAGACTtrCAA^ 

TCCATAACGCTCTGCATATCTATATGAATGGCACAAGGGAT^ 
(XlU"IHLt»ll^CCTGGCACAGATACCAT^ 

AGCCTGTGTGTCCAGCAAAGCC1 1 l^AGCTCACCgn^GCTGTCAGGGAGGCCTCCC CR AA^ 

CCCCTGCtXlAAGTGGATGACAGA<3\GTCCTGGCCT 

TGTGAGTCOGOCGAAATCCTCOtfSGCTGTCCCTAG 

CTGCGATAGCCTCGACGATTACAATCACCTCCT^ 

CCCCTCTGGC^CttX^'l^ 

OtAACCCC TCCCCCTrACT CX^ 

GAAACAGGATTTCTOX^^ 

ATAGGGCTCTGATCrrACTCCTACCTCCAGGATAGCGAT^ 
TGGCTCCAGGGACAGGATtTTCACACTCGC^ 

ATGGCACAAGCGCTAGGGCTACCACAACCCCTGCCTCCA 

CCCAC^CCCTC CT TXmCTO ^ ^ 

CXaCTAGGGAACT^XICfATCCCTGAGCCTGA^ 

TTrFCCTCTGCTGGQGCCCTTTCTTTCTGCA 

GCCGGACrGGTCAGCCTCCTGTGTAGGCATAAGAG^ 

CGATGTX^TTACCT(n7^GCTCCATGCTCAGCTC^^ 

CX!ACCCCTGAGGCTACCGGAATGAQU!CCGCTCAGAO 

TGCCCTCCCTGGAGCGGAGACAGAGCCCTCAGGTATC^ 

GGTCTTCTCCACCCTCTTCAGAGAGGGAAC^ 

TCACX^TTAACCTCATGGAAAAGGAATGCTGTCCCCC^^ 

ATCAAAAGCnVTCTQGAACAGGCTAGCACAATCTCGAGCTGGC^ 

CCTCGCCTCtX^CTCCACCAAAAC^ 

CTG CC ItAXX i CATT GC CCTCCTGGTC 

AGtn^TATC'ltJCAllJ'lTriACGCTCTG^ 

CCAAGTGCCTCTGATIGTGGGAATCCrCCTGGTCC^ 

TTCTGCAATACAGAAGGCTCAGGAAAGGCTATACCCCTCTGA 

AGCCTCGAGGATTACGATACCCTCGGCACACTGTCT 

CGCTAGGTtnxritXSAAGTGTCCATCTCCGACGG^ 
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ATATtCTCCCCTTTATCCCTCTGTATGAGCCT 
ACCtXTGAGTCCACCGGAAAGAAAAGGGTCCACCCTGACT^ 
GTITCaMTCTGCATCAGATrcrcAA^ 
AACCCGAA<XXGAAGGCCCT^ 

CGGACAGGTCAGCCTCAAGGTCAGCAATGACGGArc^ 

Melanoma cancer Specific Savine Scramble process 

Scramble - Output File 

Scramble version : 0.1 beta, 08/02/1999 

Num. genes : 10 

Kum. segments : 121 

Segment length : 30 

Segment overlap : 15 

Segments in original order: 



Gene : BAGE 

Segment # : 1 
Offset : 1 
1st Cod on : 1 

AAMAARAVFLALSAQLLQARLMK 
GCCGCTATGGCTGCCAGAGCCGTCTTCCTC^ 



B B S P V V S 



Gene : BAGE 

Segments : 2 
Offset : 16 
1st Codon : 1 

LLQARLMKBBSPVVSWRLEPBDGTALCFIF 
CTGCTCCAGGCTAGGCTCATGAAAGAGGAAAGCCCTP 

Gene 
Segments 
Offset 
1st Codon 

URL 
TGGAGAC 



3 

31 
1 



Gene : GAGE-1 

Segment* : 1 
Offset : 1 
1st Codon : 1 

AAMSWRGRSTYRPRPRRYVBPPBMIGPHRP 
GCCGCTATGTCCTGGAGAGGCAGAAGCACATACAGACCCAGACCC^ 

Gene : GAGE-1 

Segments : 2 
Offset : 16 
1st Codon : 1 

RRYVBPPBM IGPMRPEQFSDEVE PATPBBG 
AGGAGATACGTCGAGCCTCCCGAAATGATTGGCCCTATGAGA^ 

Gene : GAGE-1 

Segments : 3 
Offset : 31 
1st Codon : 1 

BQFSDBVBPATPB BGEPATQR QD PAAAQBG 
GAGCAAT 



Gene : GAGE-1 

Segments : 4 
Offset : 46 
1st Codon : 1 

EPATQRQDPAAAQ 
GAGCCIGCCACACAGAGACAGGAT 



B G B 



DBGASAGQGPKPBA 
^CGAAGGOGCTAGCGCTGGCCAAGGCCCTAAGCCTGAGGCT 



Gene 
Segments 
Offset 
1st Codon 



: GAGB-1 
5 

61 
1 



Figure 27 (Cont) 



WO 01/090197 



PCT/AU01/00622 



184/216 

BDBGASAGQGPKPBADSQEQGHPQTGCECB 
GAGGATGAGGGAGCCTCCGCCGGACAGGGACXTCA 



Gene 
Segments 
Offset 
1st Cod on 
D S Q 



GAGB-1 
6 

76 
1 

B Q G H 



P Q T G C B CBDGPDGQE 



D P P N P B B 



GACrCCCAGGAACAGGGACACCCTCAGACAGGCTGTGAGTGTGAGGATC 

Gene : GAGE- 1 

Segments : 7 
Offset : 91 
1st Codon : 1 

DGPDG .QBMDPPNPEEVKTPEEEMRSHYVAQ 
GACGGACCCGATGGCCAAGAGATGGACCCTCCCAATCCCGAAGA 



Gene 
Segment # 
Offset 
1st Codon 
V K T 



GAGB-1 
S 

106 
1 

P B B B 



M R S H Y V 



Q T G 



LWLLMNNCFLNL 



GTGAAAACTCCTGAGGAAGAGATGAGGTCCCA^ 

Gene : GAGE-l 

Segment 8 : 9 
Offset : 121 
1st Codon : 1 

TG I L W L L MNNCFLNLSPRKPAA 
ACCGGAATCCTCTGGCTCCTGATGAACAATTGCTTTCT^ 

Gene : gp!00Zn4 

Segments : i 
Offset : 1 
1st Codon : 1 

AASWSQKR S P VYVWKTWGBGLPSQPIIHTC 
GCCGCTAGCTGGAGCCAAAAGftGAAGCTTTGTCT 

Gene : gplO0In4 

Segment# : 2 
Offset : 16 
1st Codon : 1 

TWGBGLPSQPIIHTCVYFFLPDHLSFGRPF 
-fiCCTGGGGCGAAGGCCTCCGCTCCCAGCCTATCATTCAC 

Gene : gpl00In4 

Segments : 3 
Offset : 31 
1st Codon : 1 

V Y F P L P D H I, S FGRPFHLNFCDPLAA 
GTGTAT * * *CI a iviw. TOACCATCTGTCCTriXJUAftG^ 




Gene 
Segments 
Offset 
lot Codon 



KAGE-1 
2 

16 

1 



AQQEALGLVCV 
lGAGGCTCTGGGACTGGTCTGCGTC 



SSSPLVLGTL 



: MAGB-1 
Segments : 3 
Offset : 31 
1st Codon : 1 
QAATS SSSPLVLGTL 



BBVPTAGSTDPPQSP 
^CAGCCGGAAGCACAGAOCCTCCCCAAAGCCCT 
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Gene : MAGE-1 

Segroent# : 4 
Offset : 46 
1st Codon : l 

EEVPTAGSTDPPQS PQ6ASAPPTTINPTRQ 
GAGGAAGTGCCTACCGCTGGCTCCACCGATCCXrCCT 

Gene : MAGB-1 

Segment# : 5 
Offset : 61 
1st Codon : 1 

QGASAFPTTINFTRQRQPSBGS SSREEEGP 
CAGGGAGCCTCCGCCTTTCCCACAACCATTAACTTTACCAGA 

Gene : MAGB-1 

Segments : 6 
Offset : 76 
1st Codon : 1 

RQPSKGSSSREEEG PSTSCI LESLFRAVIT 
AGGCAACCCTCCGAGGGAAGCTCCAGCAGAGAGGAAGAGGGACCCTCCAC 

Gene : MAGB-1 

Segments : 7 
Offset : 91 
1st Codon : 1 

STSCI LESLFRAVI TKKVADLVGFLLLKYR 
AGCACAAGCTGTATarrCGAGTCCCTGTTTAGGGCTGTGAT^ 

Gene : MAGE-1 

Segments : B 
Offset : 106 

1st Codon : 1 

KKVADLVGFLLLKY RAREPVTKABMLBSVI 
AAGAAAGTGGCTGACCTCGTGGGATTCCIXXrnxn'CA 

Gene : MAGB-1 

Segments : 9 
Offset : 121 

1st Codon : 1 

ARB PVTKAEMLESV IKNYKHCFPE I FGKAS 
GCCAGAGAGCCTGTGACAAAGGCTGAGATGCTGGA^ 

Gene : MAGB-1 

Segments : 10 
Offset : 136 
1st Codon : 1 

KNYKHCFPBIFGKASBSLQLVFGIDVKBAD 
AAGAATTACAAACAC"ltjfTTCCCT G AG Arrrit3 GGAAAGGCTAGC^ 

Gene : MAGB-1 

Segments : 11 
Offset : 151 
1st Codon : 1 

ES LQLVFGIDVKEADPTGHSYVLVTCLGLS 
GAGTCCXrreCAACreGTCTTCGG A ATOGATGTCAA^ 

Gene : MAGB-1 

Segments : 12 
Offset : 166 
1st Codon : 1 

PTGHSYVLVTCLGLSYDGLLGDNQIMPKTG 
CO»CAGGCCATAGCrATGTGCTOGTGACATCCCTCG^ 

Gene : MAGB-1 

Segments : 13 
Offset : 181 
1st Codon : 1 

YDGLLGDHQIMPKTGFLI I V L V M IAMBGGH 
TACGA'rGGCCltXrrGGGAGACAATCAGATTATGCXrrAAGA 

Gene : MAGB-1 
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Segment# : 14 
Offset : 196 
1st Cod on : 1 

P L I IV LVMIAMBGGHAPBBBIWEELSVMBV 
TTCCTCATCATTCTGCTCGTGATGATCGCTATC 



Gene 
Segments 
Offset 
1st Codon 
A P E B 



MAGE- 1 
15 
211 
1 

E I H 



EELSVMBVYDGRBHSAYGB 



R K L 



GCCCCTGAGGAAGAGATTTGGGAAGAGCTCAGCGTCATGGAAGTGTATC 



Gene : MAGE-1 

Segments : 16 
Offset : 226 
1st Codon : 1 

YDGRBHSAYGB PRKLLTQDLVQBKYLEYRQ 
TACGATGGCAGAGAGCATAGCGCTTACGGAGAGCCTAGGAAACTGCTC^ 



Gene : MAGE-1 

Segments : 17 
Offset : 241 
1st Codon : 1 

LTQDIiVQBKYLBYRQVPDSDPARYBP 
CTGACACAGGATCTGGTCCAGGAAAAGTATCTGGAATACAG 



L W G 



Segments 
Offset 
1st Codon 
V P D 



MAGE-1 
18 
256 
1 

D P A R 



GTGCCTGACTCCGACrCTGCCAGATACGAAT 



: MAGE-1 
Segments : 19 
Offset : 271 
1st Codon : 1 
RALABTSYVKVLBYVI 
kCAAGCTATGTGAAAGTGCTCGAGTATGTGAT 



K V S A 



R P P P 



SLR 



Gene : MAGE-1 

Segments : 20 
Offset : 286 
1st Codon : 1 

IKVSARVRPPPPSLREAALR 
ATCAAAGTGTCCGCXIAGAGTGAGATTC1TT 



BBBBGVAA 



Gene : MAGE -3 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAM P h BQRSQH C K PEBGLEA RG BA LG L • V G A 
GCCGCTATGCCTCTGGAACAGAGAAGCCAACACTGTAAG 



Gene 
Segments 
Offset 
1st Codon 



MAGE- 3 
2 

16 
1 



BGLBARGBALGLVGAQAPATB 



QBAASSSS 



Gene : MAGE -3 

Segments : 3 
Offset : 31 
1st Codon : 1 

QAPATBBQBAASSSSTLVBVTLGBVPAABS 
CAGGCTCCCGCTACCGAAGAGCAAGAGGCTGCCTCC^ 

Gene : MAGE- 3 

Segments : 4 
Offset : 46 
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1st Cod on : 1 

TLVBVTLGEV 
ACCCTCGTGGAAGTGACAC 



187/216 



PAABSPDPPQSPQGASSLP 




Gene 
Segments 
offset 
1st Codon 



Gene : MAGE- 3 

Segment « : 6 
Offset : 76 
1st Codon : 1 

T M N Y P L W 
ACCATGAACTAT 



LPTTMNYPI>WSQSYEDSS 
'ACCA<IAATGAATTACCCTCTGTGGAGCCAAAGCTATGAGGATAGCTrc 



SQSYEDSSHQEBEGPSTFPDLES 
^CTCCAGCAATCAGGAAGAGGAAGGCCCTAGCACATTC^ 

MAGE- 3 
7 

91 
1 

NQBBBGPSTFPDLES EPQAALSR KVAB LVH 
AACCAAGAGGAAGAGGGACCCTCCACtTTTTCCCGATCTG^ 



Segments 
Offset 
1st Codon 



Gene : MAGE- 3 

Segments : 8 
Offset : 106 
1st Codon : 1 

BFQAALSRKVA 
GAGTTTCAGGCTGCCXrrCAGCAGAAAGGTC 



B L V H 



LLLKYRAREPVTKA 
ATACAGAGCCAGAGAGCCTGTGACAAAGGCT 



Gene 
Segments 
Offset 
1st Codon 

F L L 



MAGE -3 
9 

121 
1 

K Y R A R E P 



VTKABMLGS 



V V G » W Q Y 

VrrGGCAATAC 



Gene : MAGE- 3 

Segments : 10 
Offset : 136 
1st Codon : 1 

BMLGSVVGNWQYF 
GAGATGCTGGGAAGCGTCGTGGGAAACTGGCAGTAT 



Gene : MAGE -3 

Segments : 11 
Offset : 151 
1st Codon : 1 

V I F S K A S 
GTGATTTTCTC 



IFSKASSSLQLVFG 
ATC1TTAGCAAAGCCTCCAGCTCCCTGCAACTGGTCTTCGGA 



SSLQLVPGIE 



LMBVDPIGHLYIF 
VTCGAAGTGGATCCCATTGGCCATCTGTATATCTTT 



Gene : MAGE -3 

Segments : 12 
Offset : 166 
1st Codon : 1 

IBLMBVDPIGHLYIFATCLGLSYDGLLGDN 
ATCGAACTGATGGAGGTCGACCCIATCGGACACCrCTACA r 1 ITllAJTACCTGU^TGGGACTGTCCTACGATGGCCT^ 



Gene 
Segments 
Offset 
1st Codon 

A T C 
GCCACAT 



MAGE* 3 
13 
181 
1 

LGLSYDGLLGDNQI MPKAGLLI I V I* A I 
lTGACGGACTGCTCGGCGATAACCAAATCATGCCC^ 



Gene : MAGE -3 

Segments : 14 

Offset : 196 

1st Codon : 1 

Q I M PKAGLLII 



V 1* A I I A R 



GDCAPEBKIWB 
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CAGATTA' 




kTTA' 



rATCATTGCCAGAGAGGGAGACTCTGCCCCTGAGGAAAAGAT^ 



Gene 



MAGE -3 



Segments : 15 
Offset : 211 
1st Codon : 1 

IARBGDCAPEEKIWEELSVLEVFEGREDSI 
ATaXTTAGGGAAGGCGATTGCGCTCCCGAAGAGAAAATCT^ 

Gene : MAGE -3 

Segment # : 16 
Offset : 226 
let Codon : 1 

ELSVLEVFEGREDS I LG'DPK KLLTQHFVQE 
GAGCTOVGCGTCCTGGAAGTGTTTGAGGGAAGGGAA 

Gene : MAGE -3 

Segment* : 17 
Offset : 241 
1st Codon : 1 

LGDPKKLI#TQHFVQENYLEY RQVPGSDPAC 
CTGGGAGACCCTAAGAAACTGCTCACCCAACACn^ 

Gene : MAGE- 3 

Segment^ : 18 
Offset : 256 

1st Codon : 1 

NYLEYRQVPGSDPACYEFLWGPRALVETSY 
AACTATCTGGAATACAGACAGGTCCCCGGAAGCGATCCCGCT^ 

Gene : MAGE -3 

Segment* : 19 
Offset : 271 

1st Codon : 1 

Y B G P R A U V BTSYVKVLHHMVKISGG PH 

TACGAATTCCTCTGGGGACXXAGAGCCCTCXnt3GAAA^ 

Gene : MAGE- 3 

Segment^ : 20 
Offset : 286 
1st Codon : 1 

VJCVLHHMVKI SGGPHISYPPLHEWVLRBGE 
GTGAAAGTGCTCCACCATATGGTCAAGATTAGCGGAG^ 

Gene : MAGE- 3 

Segments : 21 
Offset : 301 
1st Codon : 1 

ISYPPLHBWVLRBGBBAA 
ATCTCCTACCC7CCCCTCCACGAATGGGTCCTGAGA 

Gene : PRAMS 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAMERRRLWGSI QSRYI SMSVHTS PRRLVE 
GCCGCTATGGAAAGGAGAAGGCTCTGGGGAAGCATTCA^ 

Gene : PRAMS 

Segments : 2 
Offset : 16 
1st Codon : 1 

YISMSVWTSPRRLVELAGQSLLKDEALAIA 
TACATTAGCATGAGCGTCTGGACAAGCCCTAGGAGACTGGTCGAGCTC^ 

Gene : FRAME 

Segments : 3 
Offset : 31 
1st Codon : 1 

LAGQSLLKDEALAIAALBLLPRELFPPLFM 
CTGGCTGGCCAAAGCCTCCTGAAAGACGAAGCtX^ 
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Gene : PRAME 

Segments : 4 
Offset : 46 
let Codon : 1 

ALBLLPRBLFPPLFMAAFDGRHSQTLKAMV 
GOCXTCGAGCTCCTGCCTAGGGAACTCm 

Gene : PRAME 

Segments : 5 
Offset : 61 
1st Codon : 1 

A AF P GRHSQTLKAMVQAWPFTCLPLGVLMK 
GCCGCTTTCGATGGCAGACACTa:CAGACACTGAAA^^ 

Gene : PRAME 

Segments : 6 
Offset : 76 
1st Codon : 1 

Q A W P F T CLPLGVLMKGQH LHLETFKAVLDG 
CAGGCTTGGCCTTTCACATGCCTCCCCCTC^ 

Gene : PRAME 

Segments : 7 
Offset : 91 
1st Codon : 1 

GQHLHLETFKAV LDGLDVLLAQEVRPRRWK 
GGCCAACACCTO^CXriXXy«^CATTCAAAGCC^ 

Gene : PRAMB 

Segments : 8 
Offset : 106 
1st Codon : 1 

I> D V LLAQBVRPR R W KLQVLDLRKNSHQDFW 

CTGGATGTGCTCCTGGCTCAGGAAGTGAGACXC^ 

Gene : PRAMB 

Segments : 9 
Offset : 121 
1st Codon : 1 

L 0 V JL- D l r k n S H Q D F W TVWSGHRASLYSFPE 
CTGCAAGTGCTCXSACCTCAGGAAAAACrCCCACCAAGACTT^ * 



: PRAMB 
Segments : 10 
Offset : 136 
1st Codon : 1 

TVWSG N R A SLY SFPBPEAAQPMTKKRKVDG 
ACCGTCTGGTCCGGCAATAGGGCTAGCXrrcTACT 

Gene : PRAMB 

Segments : 11 
Offset : 151 
1st Codon : 1 

P B AAQPMTKXRKVDGLSTBABQPFIPVBVIi 
CCCGAAGCCGCTCAGCCTATGACAAAGAAAAGGAAAGTGGATGGC^ 1 ITAltXVlXrrGGAAGTCCTC 

Gene ' : PRAMB 
Segments : 12 
Offset : 166 
1st Codon : 1 

LSTBAEQPPI PVBVLVDLFLKEGACDBLFS 
CTXTTO^CCGAAGCCGAACAGCCm^ 



Gene 
Segments 
Offset 
1st Codon 



PRAMB 
13 
181 
1 

Y L I BKVKRKKNVLRL 
\TCTGATTGAGAAAGTGAAAAGGAAAAAGAATGTGCTCAGGCTC 



Gene : PRAME 

Segments : 14 
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Offset : 196 
let Codon : l 

YLIBKVKRKKNVLRLCCKKLKI FAMPMQDI 
TACCTCATCGAAAAGGTCAAGAGAAAGAAAAAOmxrrGAGACTGT^^ 

Gene : PRAMB 

Segment # : 15 
Offset : 211 
1st Codon : 1 

CCKKLKIFAMPMQDIKMI LKMVQLDSIEDL 
TGCIXn'AAGAAACrcAAAATCTTTGCCATGCCC^ 

Gene : PRAMB 

SegroentS : 16 
Offset : 226 
1st Codon : 1 

KMI LKMVQLDS I EDLBVTCTWKLPTLAKFS 
AAGATGATCCTCAAGATGGTGCAACTGGATAGCATTGAG^ 

Gene : PRAMB 

Segments : 17 
Offset : 241 

1st Codon : 1 

E V T C T W KLPTLAKFS PYLGQMINLRRLLLS 
GAGGTCACCTGTACCTGGAAGCTCCCCAC^CTGGCTAAfn^ 

Gene : PRAMB 

Segments : 18 
Offset : 256 
1st Codon : 1 

PYLGQMINLRRLLLSHIHASSYISPBKEEQ 
CCCTATCTGGGACAGATGATCAATCTGAGAAGGCTCX^ 

Gene : PRAMB 

Segment* : 19 
Offset : 271 
1st Codon : 1 

HIHASSYI SPBKBBQYIAQPTSQFIiSLQCL 
CACATTCACGCTAGCTCCTACATTAGCCCTGAGAAAGAGGAA^ 

Gene : PRAMB 

Segments : 20 
Offset : 286 
1st Codon : 1 

YIAQFTSQFLSLQCLQALYVDS LPFLRGRL 
TACATTGCCCAATTCACAAGOCAATTCCrCAGCCTCCAGTG 

Gene : PRAMB 

Segments : 21 
Offset : 301 
1st Codon : 1 

QALYVDSL F P LRGRLDQLLRHVMNPLBTLS 
CAGGCTCTGTATGTGGATAGCCTCTTCT^^ 

Gene : PRAMB 

Segments : 22 
Offset : 316 
1st Codon : 1 

DQLLRHVMMPLBTLSITHCRLS EGDVMHLS 
GACTAACTGCTCAGGCATGTGATGAACCCTCTG^ 

Gene : PRAMB 

Segments : 23 
Offset : 331 
1st Codon : 1 

ITHCRLSBGDVMHLSQSPSVSQLSVLSLSG 
ATCACAAACTGTAGGCTCAGCGAACGCGATGTGAT 



Gene 
Segments 
Offset 
1st Codon 



PRAMB 

24 

346 

1 
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QSPSVSQLSVLSLSGVMLTDVSPEPLQALL 
CAGTCCXrCCrCCGTGTCCCA^ 

Gene : PRAMS 

Segments : 25 
Offset : 361 
lat Codon : 1 

VMLTDVSPBPLQALLBRASATLQDLVPDBC 
GTGATGCTGACAGAOGTCAGCCCTGAGCCTC 



Gene : PRAMB 

Segment# : 26 

Offset : 376 

1st Codon : 1 

ERASATLQD L V F D ECG ITDDQLLALLPSLS 
GAGAGAGCCTCCGCCACACTGCAAGACCirGlXS'lTltACGAATGCG 

Gene : PRAMB 

Segments : 27 
Offset : 391 
1st Codon : 1 

GITDDQLLALLPSLSHCSQLTTLSFYGNS I 
GGCATTACCGATGACCAACTGCTCGCCC^ 

Gene : PRAMB 

Segments : 28 
Offset : 406 
1st Codon : 1 

H C S Q L T TL S P Y G N S X S ISALQSLLQHLIGL 
CACTGTAGCCAACTtiACAACCCTCAGCrTTTA 

Gene : PRAMB 

Segments : 29 
Offset : 421 
1st Codon : i 

SISALQSLLQHLIGLSNLTHVLYPVPLBSY 
AGCATTAGCGCTCTGO^AAGCCrCCrGCAACACCTCA 

Gene : PRAMB 

Segments : 30 
Offset : 436 
1st Codon : l 

SNLTHVLYPVPLBSYBDIHGTLHLBRLAYL 
AGCAATCTGACACACGTCCTGTATCCCGTCCCCC^ 

Gene : PRAMB 

Segments t 31 
Offset : 451 
1st Codon : 1 

EDIHGTLHLBRLAYLHARLRBLLCELGRPS 
GAGGATATCXATGGCACACroCATCTGGAAAGGCTCGCCT 



: PRAMB 
Segments : 32 
Offset : 466 
1st Codon : 1 

HARLRBLLCELGRPSMVWLSANPCPHCGDR 
CAOGCTAGGCTCAGGGAACTGCTCTGCGJWVCTGGGAAG 

Gene : PRAMB 

Segments : 33 
Offset : 481 
1st Codon : 1 

MVWLSAHPCPHCGDRTPYDPEPI 

Gene : PRAMB 

Segments : 34 
Offset : 496 
1st Codon : 1 

TFYDPBP1LCPCFMPNAA 
AC C1 rn ' A CGATCCCGAACCCATrCTGTG TC CC I CT rf CA TGCCCAATGCCGCT 
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Gene 
Segment** 
Offset 
1st Codon 
A A L M 



TRP2IN2 

1 

1 

1 

E T H 



LSSKRYTEBAGG 



F P W 



K V Y Y Y R 



GCOGCTCTGATGGAGACACACCTCAGCTCCy^ 



Gene : TRP2IN2 

Segments : 2 
Offset : 16 
1st Codon : 1 

EAGGPFPWLKVYYYRFVIGLRVWQWEVI S C 
GAGGCTGGCGGATTCTTTCCCTGGCTGAAAGTGTATTACTATAG 

Gene : TRP2IN2 

Segment # : 3 
Offset : 31 
1st Codon : 1 

FVIGLRVWQWBVISCKLIKRATTRQPAA 
TTCGTCATCGGACTGAGAGTGTGGOU?TGGGAGGT^ 



Gene : NYNSOla 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAHQABGRGTGGSTGDADG 
GCCGCTATGCAAGCCGAAGGCAGAGGCAC^GGa5GAAGCACAGG 



PGGPGIPDGPG 



Gene : NYNSOla 

Segment^ : 2 
Offset : 16 
1st Codon : 1 

DADGPGGPGIPDGPGGNAGGPGBAGATGGR 
GACGCTGAOGGACXTCGGAGGCCCTGGCATTCrc 

Gene : NYNSOla 

Segments : 3 
Offset : 31 
1st Codon : 1 

GNAGGPGBAGATGGRGPRGAGAARASGPGG 
GGCAAT 



Gene : NYNSOla 

Segments : 4 
Offset : 46 
1st Codon : 1 

GPRGAGAARASGPGGGAPRGPHGGAASGLN 
GGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCX3GAGGO^ 

Gene : NYNSOla 

Segments : 5 
Offset : 61 
1st Codon : 1 

GAPRGPHGGAASGLNGCCRCGARGPBSRLL 
GGCGCTCCCAGAGGCCCTCACGGAGGCGCTGCCTCCGGC^ 

Gene : NYNSOla 

Segments : 6 
Offset : 76 
1st Codon : 1 

GCCRCGARGPESRLLBFYLAHPFATPMBAB 
GGCTGTTGCAGATGCGGAGCCAGAGGCCCTXaAGT^ 



Gene : NYNSOla 

Segments : 7 
Offset : 91 
1st Codon : 1 

BFYLAHPPATPMBABLARRSLAQDAPPLPV 

GAGTTTTACCTCGCCA flaLXXT 1" 1 GCCACACCCATGGAGGCTGAGCTCGCCAGAAGGTCCCTGGC^ 



Gene 



NYNSOla 
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Segment # 
Offset 
1st Codon 



8 

106 
1 



LARRSLAQDAPPLPVPGVLLKEFTVSGNI L 
CTGGCTAGGAGAASCCrCGCCCAAGACGCTCCCCCTCT ^ 

Gene : NYNSOla 

Segments : 9 
Offset : 121 
1st Codon : 1 

P G V L L K E F TVSGHILTI RLTAADHRQLQLS 
CCCGGAGTGCTCCTGAAAGAGTTTACCOTCAGCGGAAACA 

Gene : NYNSOla 

Segments : 10 
Offset : 136 
1st Codon : 1 

TIRLTAADHRQLQLS ISSCLQQLSLLMWIT 
ACXATTA<3GCrCACCGCrcCCGAT^ 

Gene : NYNSOla 

Segment^ : 11 
Offset : 151 
1st Codon : 1 

I SSCLQQLSLLMW ITQCPLPVPLAQPPSGQ 
ATCTCCAGCTGTCTGCAACAGCTCA 



Gene 
Segment* 
Offset 
1st Codon 

Q C F L P 
CAGTGTTT< 



NYNSOla 
12 
166 
1 

VFLAQPPSGQRRAA 
'AGCGGACAGAGAAGGGCTGCC 



Gene : NYNSOlb 

Segment* : 1 
Offset : 1 
1st Codon : 1 

AAMLMAQEALA FLMAQGAMLAAQB RRV P RA 

GCCGCTATGCTCATGGCTCAGGAAGCCCTOXXr^^ 



Gene 
Segments 
Offset 
1st Codon 



NYNSOlb 
2 

16 
1 



Q G A M L A A 

CAGGGAGCCAT 



VPGAQGQQGPRGR 



Gene : NYNSOlb 

Segments : 3 
Offset : 31 
1st Codon : 1 

GCcJlAGl^^ A A R L Q G 

Gene : NYNSOlb 

Segments : 4 
Offset : 46 
1st Codon : 1 

EBAPRGVRMAARLQGAA 
GAGGAAGCCCCTAGGGGAGTGAGAATGGCTGCCAGACTGCA 



Gene : LAGE1 

Segments : 1 
Offset : 1 
1st Codon : 1 

A A H Q A B GQGTGGSTGDADGPGGPGIPDGPG 
GCCi^iAXiiCAAGCCGAAGGCCAA 

Gene : LAGE1 

Segments : 2 
Offset : 16 
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1st Codon : l 

DAD6 PGGPGI PDGPGGNAGGPGEAGATGGR 
GACGCTGAOGGACCCGGAGGCCCTGGCATTCCCGATGGC^ 

Gene : LAGE1 

Segments : 3 
Offset : 31 
1st Codon : 1 

GNAGG PGEAGATGGRG PRGAGAARASG PR G 
GGCAATGCCGGAGGCCCTGGCGAAGOCGGAGCCACAGGCGG^ 

Gene : LAGE1 

Segment # : 4 
Offset : 46 

1st Codon : 1 

G P RGAGAARASGPRGGAPRG PHGGAASAQD 

GGCCCTAGGGGAGCCGGAGCCXXTAGGGCTAGCGG^ 

Gene : UVGEl 

Segments : 5 
Offset : 61 
1st Codon : 1 

GAPRGPHGGAASAQDGRCPCGARRPDSRLL 
GGCGCTCtX»GAGGCCCTCACBGAGGCGCTG^ 

Gene : LAGE1 

Segments : 6 
Offset : 76 
1st Codon : 1 

GRCPCGARRPDSRLLQLHITMPFSS PMBAE 
GGCAGATGC<XnTGCGGAGCCAGAAGGCCIX» 

Gene : LAGE1 

Segments : 7 
Offset : 91 

1st Codon ; 1 

QLHITMPFSS PMBABLVRRI LSRDAAPLPR 
CAGCTCCACATTACCATGCCCTTTAGCIt^ 

Gene : LAG El 

Segments : 8 
Offset : 106 
1st Codon : 1 

L JL_ R RILSHDAAPI »PRPGAVLKDFTVSGNLL 
CTGGTCftGGAGAATCCTCAGCAGAGAa^^ 

Gene : LAGB1 

Segments : 9 
Offset : 121 
1st Codon : 1 

P G A __ V _ L * P P *__ V _ S GN I*^^I RLTAADHRQLQLS 
CCCGGAGCCGTCCTGAAAGACTTTACCGTCAGCGGAAArc 

Gene : LAGE1 

Segments : 10 
Offset : 136 
1st Codon : 1 

FIRLTAADHRQLQLSISSCLQQLSLLMWIT 
TTCATrAGGCTCACCGCTtKXGATCAO^GAC^ 

Gene : LAG El 

Segments : 11 
Offset : 151 
1st Codon : 1 

I S S C L Q Q ■!» SLL M W ITQCFLPVFLAQAPSGQ 
ATCTCOUSCrGTCTGCAACAGCTCnGCC^^ 

Gene : LAGB1 

Segments : 12 
Offset : 166 
1st Codon : 1 

QCPLPVPLAQAPSGQRRAA 
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CAGTGTTTCCTCCCC 
Segments in scrambled order: 
MAGB-1 #15 

APE B E I W BELSVMBVYDGRBHSAYGBPRKL 
6CCCCTGAG6AAGA6ATTTG6GAAGAGCTCAG0GTCATGGAA6TGTA 

MAGB-1 #4 

BBVPTAGSTDPPQSPQGASAFPTTINFTRQ 
GAGGAAGTGCCTACCGCIWCTCCACCGATCCCrCT^ 



PRAME #10 
TVWSGNRASLY 

MAGE- 3 #14 

QIMPKAGLLII 
CAGATTATGCCTAAGGCTGGCCTCCTGATTAT 



PPEPBAAQPMTKKRKVDG 

ATGACCAAAAAGAGAAAGGTCGACGGA 

V I# A I IARBGDCAPBEKIHB 
^TCATTGCOVGAGAGGGAGACTGTGCCCCTGAGGAAAAG^ 



PRAME #9 

LQVLDLRKNSHQDFWTVWSGNRASLYSPPB 
CTGCAAGTGCTCGACCTCAGGAAAAACTCCCACCAAGACT 



PRAME #6 

L D V L L A Q 
CTGGAT 



E V R P R R W 



KLQVLDLRKNSHQDFW 
VTCTGAGAAAGAATAGCCATCAGGATTTCTGG 



MYNSOlb #2 

Q G A M L A AQER RVPRAABVPGAQGQQGPRG R 
CAGGGAGCCATGCTGGCTGCCCAAGAGAGAAGGGTCCCCAGAGCCGCTGAG 



PRAME #24 
QSPSVSQLS 



VLSLSGVMLTDVSPEPLQALL 
WXXTrCACCGATGTGTCCCCCGAACCC^^ 



MAGB-1 #17 

LTQDLVQBKYLBYRQVPDSDPARYBFLWGP 
CTGACACAGGATCTGGTCCAGGAAAAGTATCTGGAAT^ 

MAGB-1 #6 

RQPSBGSSSREBBGPSTSCILBSLFRAVIT 
AGGCAACCCTCOGAGGGAAGCTCCAGCAGAGAGGA 



BAGS #1 

A A M 
GCCGCTAT 



A A R A V 



FLALSAQLLQARLMKEB SPVVS 
^TCCTGCAAGCCAGACTGATGAAGGAAGAGTCCCCCGTOGTGTC 



PRAME #34 

T F Y DPBPILCPCFMPNAA 
ACCTTTTAOGATCCOGAACCCATrCTGTGTCCCTGTTTCATG 

MAGE- 3 #12 

IBLMBVDPIGHLYIFATCLGLSYDG LLGON 
ATCGAACTGATGGAGGTCGACCCDVTCGW 

GAGB-1 #2 

* R Y V B P P B MIGPMRPBQFSDBVBPATPBEG 
AGGAGATACGTOGAGCCTCCCGAAATGATTGGCCCTATGA 

TRP2IN2 #2 

BAG G F F P HLKVYYYRFVIGLRVHQNBVISC 
GAGGCTGGCGGATTCTTTCCCTGGCTGAAAGTGTATT^ 

PRAME #1 

AA M BRRRLWGSIQSRYI SMSVNTS PRRLVB 
GCCGCTATGGAAAGGAGAAGGCTCTGGGGAAG 

TRP2IH2 #1 

A A L MBTHLSSKRYTBBAGGFFPWLKVYYYR 

GCOGCTCIGATGGAGACACACCTCAGCTCCAAGAGATACACAGAGGAA I tAJCT l t^ CrCAAGGTCTACTATTACAGA 
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MAGE-l #1 

AAMSLBQRSLHCKPBBALBAQQEALGLVCV 
GCCGCTATGTCCCTGGAACAGAGAAGCCTCCACTGT 

MAGE-l #3 

QAATSSSS PLVLGTLBEVPTAGSTDPPQS P 
CAGGCTGCCACAAGCTCCAGCTCCCCCCTCGTGC^^ 

PRAMB #4 

ALELLPRELF PPLPMAA FDGRHSQTLKAMV 
GCCCTCGAGCTCCTGCCTAGGGAACTGTTTCCrcCTCTGTT^ 

MAGE- 3 #16 

E L S V L E V F BGREDSILGDPKKLLTQHFVQE 
GAGCraGCGTCCTGGAAGTGTTTGAGGGAAGGGAAGACTC^ 

MAGE-l #11 

B S L Q L V F G IDVKBADPTGHSYVLVTCLGLS 
GAGTCCCTGCAACTGGTCTTCGGAATCGATGTGAAAGAGGCT^^ 

MAGE- 3 #5 

PDPPQSPQGASSLPTTMNYPLWSQSYEDSS 
CCCGATCCCCCTOkGTCCCCCCA^ 

LAG El #1 

AAMQABGQGTGGSTGDADGPGGPGIPDGPG 
GCCGCT7VTGCAAGCCGAAGCKXAAGGCACAG 

NYNSOla #12 

QCFLPVFLAQPPSGQRRAA 
CAGTGTTTCCTCCCCGTCTTCCTCGCCC^ . 

gpl 00In4 #2 

T W G B G L P S QPI IHTCVYFFLPDHLSFGRPF 
ACCTGGGGOGAAGGCCTCCCCTCCCAGCCTATCATTCACACATG 

MAGE-l #7 

STSCIX.BSLFRAVITKKVADLVGFLLLKYR 
AGCACAAGCTGTATCCTCGAGTCraxrm 

NYNSOla #1 

AAMQABGRGTGGSTGDADGPGG PGI PDGPG 
GCCGCTATGCAAGCCGAAGGCAGAGGCACAGGCGGAAGC^ 

GAGE- 1 #7 

DGPDGQEMDPPNPBBVKT PEEEMRSHYVAQ 
GACGGACCCGATGGCCAAGAGATGGACCCTCXrAATCC^^ 

NYNSOla #11 

1 s S ~S-JL qqlsllmwit Q cf i*pvflaqppsgq 

^TCTCCAGCTGTCTGCAACaGCTCAGCrTC^^ 
PRAMB #26 

BRASATLQDLVFDBCGITDDQLLALLPSLS 
GAGAGAGCCTCCGCCACACTGCAAGACCTOGT^lTf^ 

MAGE- 3 #17 

I*G D P K KLLTQHPVQBNYLBYRQVPGSDPAC 
CTCGGftGACCCTAAGAAACrGCTCACOCAACA ... 

MAGB-1 #2 

BALEAQQEALGLVCVQAATSSSSPLVLGTL 
GAGGCTCTGGAAGCCCAACAGGAAGCCCTCGGCC^ 

NYNSOla #7 

g F Y LAM P P A TPMBABLARRSLAQDAPPLPV 
GAGV^r/iCCTCGCCATGCCCmiSCCaCACC^ 

NYNSOlb #4 

BBAPRGVRMAARLQGAA 
GAGGAAGCCCCTAGGGGAGTCJVGAATGGCTGCCAGACT 
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BAGE #3 

WRLBPEDGTALCFIFAA 
GACE-1 #3 

B ° S DBVBPATPBBGBPATQRQDPAA. AQBG 

GAGCAATTCTCCGACGAAGTGGAACCCGCTACCCCTGAGGAAGGCG^ 

MAGE- 3 #6 

T M N Y P -- L — W SOSYBDSSNQBBBGPSTPPDLBS 
ACCATGAACTATCCCCTCTGGTCCCAGTCCTACGAAGACTCC^ 

MAGE- 3 #7 

^^^^^^E^B^G^^^S TFPDLBSEFQAALSRKVABLVH 
PRAMB #13 

V D L F ** K B G A C D B *» F S Y L I EKVKRKKNVLRL 
GTGGATCreTTTCTGAAAGAGGGAGCCTGTGACG 

NYNSOla #10 

TIRLTAADHRQLQLS I SSCLQQLS LLMW I T 
ACCATTAGGCK3UXGCTGCCGATCACAGACAG 

MAGE- 3 #1 

* A M P I* B QRSQHCKPBBGLBARGBALGLVGA 
GCOGCTATGCCTCTGGAACAGAGAAGCCAACACTGTAA^ 

NYNSOla #2 

DADGPGGPGIPDGPGGNAGGPGBAGATGGR 
GACGCTGACGGACCCGGAGGCCCTGGCATKXXX^ 

MAGB-3 #19 

YEFLWG PRALVBTSYVKVLHHMVKISGG PH 
TACGAATTCCTCTGGGGACCCAGAGCCCTt^^ 

PRAMB #23 

I T K C RLSBGDVMHLSQSPSVSQLSVLSLSG 
ATCACAAACTGTAGGCTCAGOGAAGGCGATGTGATGCACCTCAGCCA^ 

MAGB-3 #18 

» Y _> BYRQVPGSDPACYBFLWGPRALVETSY 
AACTATCTGGAATACAGACA{^TCCCCGGAA 

MAGB-3 #11 

V IPS KASSSLQLVFG I BLMBVDP IGHLY I F 
GTGATTTTC'fCXJAAGGCITUXrrcCA 

PRAMB #21 

QALYVDSLFFLRGRLDQLLRHVMNPLBTLS 
CAGGCTCTGTATGTGGATAGCCTCTTCTTTCTGAGAGGCAGA 

PRAMB #20 

YIAQFTSQFLSLQCLQALYVDSLFFLRGR1* 
TACATTGCCCAATTCACAAGCCAATTCCT^^ 

PRAMB #7 

GQHLHLBTFKAVLDGLDVLLAQBVRPRRWK 
GGCCAACACCTCCACCTCGAGACATTCAAAGCt^^ 

LAG El #10 

FIRLTAADBRQLQLS ISSCLQQLSLLMWIT 
TTCATTAGGCTCACCGCTGCCGATCACAGACAGCTCCAG^ 

PRAMB #15 

C C K K L K I F A M PMQDIKMILKMVQLDSI B D L 
TGCTGTAAGAAACTGAAAATCTTTGCCATGCCCATGCAG 

NYNSOla #5 

GAPRGPHGGAASGLNGCCRCGARGPESRLL 
GGCGCTCCCAGAGGCCCTCACGGACGCGCTGan^^ 
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MAGE- 1 #8 

KKVADLVGP L L LXYRARE PVTKAEMLESV I 
AAGAAAGTGGCTGACCTCGTGGGATTCCTCCTGCT^ 

MAGE-1 #13 

Y ° G L _ L G DNQIHPKTGPLI IVLVMIAMEGGH 
TACGATGGCCTCCTGGGAGACAATCAGATTATGCCTAAGACAGGC^ 

PRAME #29 

S ISALQSLLQHLIGLSNLTHVLYPVPLBS Y 
W^TTAGCGCTCTGCAAAGCCTCCTGCAACAC^ 

MAGE- 3 #15 

IAREGDCAPBEKIWBELSVLEVFBGREDSI 
ATCGCTAGGGAAGGCGATTGCGCTCCCXJAAGAGAAAATCT^ 

PRAME #22 

DQLLRHVMNPLBTLSITNCRLSBGDVMHLS 
GACCAACTGCTCAGGCATGTCATCAACCCTCTGGAAACCCTCAGC^ 

MAGE-1 #19 

R ------ A B T S Y v KVLEYVIKVSARVRFFFPSLR 

AGGGCIVTCGITGAGACAAGCTATGT^ X IT A ' TL 'r rj ' C CCTCCCTGAGA 

PRAME #30 

S H L T H VLY PVPLB5YBOI HGTLHLBRLAYL 

AGCAATCTGACAC^C^iwa^-aTCCCGTCCCCCT 

WYNSOlb #1 

AAMLMAQBALAFLMAQGAMLAAQBRRVPRA 
GCCGCTATGCTCATGGCTCAGGAAGCCCTCGCCTTTCTG^ 

MAGE-1 #10 

RNYKHCFPBIFGKASBSLQLVFGIDVKBAD 
AAGAATTACAAACACTGTTTCCCTGAGATTT^^ 

MAGE- 3 #4 

TLVBVTLGBVPAAESPDP PQSPQGASSLPT 
PRAMS #32 

HARLRBLLCELGR PSMVWLSANPCPHCGDR 
CACGCTAGGCTCAGGGAACTGCTCTGCGAACTGGGAA^ 

PRAME #25 

VMLTDVSPEPLQALLBRASATLQDLVFDEC 
CTGATGCTGAO^GACGTCAGCCCTGAGCCTCTGCAA^ 

GAGB-1 #5 

HDBGASAGQGPKPBADSQBQGH PQTGCBCB 
GAGGATGAGGGAGCXTTCCGCCGGACAGGGACCCAAACCC^^ 

MAGE -3 #10 

EMLGSVVGNWQYFPPVI FSKAS SSLQLVPG 
GAGATGCTGGGAAGCCTCGTGGGAAACTGGCAGTAJ'T 

GAGB-1 #1 

AAMSMRGRSTYRPRPRRYVEPPEMIGPMRP 
GCCGCTATGTCCTGGAGAGGCAGAAGCACATACAGACCCAGACCCAGAA 

PRAMB #2 

YISMSVWTSPRRLVBLAGQSLLKDBALAIA 
TACATTAGCATGAOOGTCTGGACAAGCCCTAGGACACTGGTOGA^ 

MAGB-1 #16 

YDGREHSAYGBPRKLLTQDLVQBKY LBYRQ 
TACGATGGCAGAGAGCATAGCGCTTACGGAGAGOCTAGGAAACTGCTO 

LAGB1 #12 

QCFLPVFLAQAPSGQRRAA 
CAG1G 1 TTCCT CCCCGiVrit lC IUt X CC A AGCCCCTAGOGGAQUSAGAAGGGCTG^ 
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MAGB-3 #20 

VKVLHHMVKISGG PHI SYPPLH EWVLREGE 
GTGAAAGTGCTCCACX^TATGGTCAAGATTAGCGG^ 

LAG El #7 

Q L H ITMPPSSPMEABLVRRI LSRDAAP LPR 
CAGCTCCACATTACCATGCCCTTTAGCTCCCCCATGGAGGCTG^ 

NYNSOla #9 

PGVLLKBPTVSGHI LTIRLTAADHRQLQLS 
CCCGGAGTGCTCCTGAAAGAGTTTACCGTCAGCGGAAACATTCTGA 

PRAME #16 

KMILKMVQLDS IEDLBVTCTNKLPTLAKPS 
AAGATGATCCTCAAGATGGTGCAACTGGATAGCATT^ 

MAGE-1 #14 

PLI IVLVMIAMEGGHAPEBEIWEBLSVMEV 
TTCCTCATCATTGTGCTCGTGATGATCGCTATGGA^^ 

PRAME #17 

B V T C T WKLPTLAKFSPYLGQMINLRRLLLS 
GAGGTCACCTGTACCTGGAAGCTCCCCACACTGGCTAAGTT^ 

MAGE -3 #2 

EGLBARGBALGLVGAQAPATEBQBAAS S S S 
MAGB-3 #21 

ISYPPLHEWVLRBGEEAA 
ATCTCCTACCCTCCCCTCCACGAATCGGTCC^ 

PRAMS #19 

HIHASSYISPEKBBQYXAQFTSQFLSIiQCL 
CACATTCACGCTAGCTCCTACATTAGCCCTG^ 



GNAGGPGBAGATGGRGPRGAGAARASG PGG 
GGCAATGCCXXSAGGCCCTGGCGAAGCCGGAGCCACAG^ 

NYNSOla #4 

GPRGAGAARASGPGGGAPRG PHGGAASGLN 
GGCCCTAGGGGAGCCGGAGCCGCrAGGGCTAGCGGACCC^ 

MAGE-1 #5 

QGASAPPTTINPTRQRQPSEGSSSRBBBGP 
CAGGGAGCCTCCGCCTTTCCCACAACCATT^^ 

NYNSOla #8 

LARRSLAQDAPPLPVPGVLLKEFTVSGNIL 
CTGGCTAGGAGAAGCCTCGCCCAAGACGCTCCCCC^ 

PRAME #5 

AAPDGRHSQTLKAMVQAW PFTCLPLGVLMK 
GCC GCi Tl C GATGGCAGACACTCCCAGACACTGAA 

MAGE-1 #20 

IKVSARVRPPPPSLREAALREEEBGVAA 
PRAMB 927 

GITDDQLLALLPSLSHCSQLTTLSFYGNSI 
GGCATTACCGATGACCAACTGCrCGCCCTCCTGCCTA 

GAGE- 1 #8 

VKTPBBBMRSHYVAQTGI LWLLMNNCFLNL 
GTGAAAACCOTGAGGAAGAGATGAGGTCCCACT A TC 

LAGB1 #11 

ISSCLQQLSLLMWITQCPLPVPLAQAPSGQ 
ATCKX3«XritntnTX3kACAGCTQCC^^ 



NYNSOla #3 
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PRAME #14 

YLI BKVKRKKHVLRLCCKKLKIFAMPMQDI 
TACCTCATCGAAAAGGTCAAGAGAAAGAAAAACGTCCTGAGA^ 

MAGB-1 #9 

ARB P V TKAEMLESVIKNYKHCPPEIFGKAS 
GCCAGAGAGCCTGTGACAAAGGCIX5AGATGCItX3AAAGCGTC^ 

LAGE1 #8 

LVRRILSRDAAPLPRPGAVLKDFTVSGMLL 
CTGGTOVGGAGAATCCTCAGCAGAGACGCTGCCCCTCTTC 

PRAME #28 

H C S Q L T TL S P Y GNSISISALQSLLQHLIGL 
CACIXTTAGCCAACTGACAACCCTCAGCTTTTACGGA 

PRAMS #33 

H V W J*__S ANPCPHCGDRTFYDPEPI LCPCFMP 
ATGGTCTGGCTCAGCGCTAACCCTTGCCC^ 

gpl00In4 #1 

A * S — !L SQKRSPVYVWKTW GEGI>PSQPIIHTC 
GCCGCTAGCTGGAGCCAAAAGAGAAGCTTTGTGTATGTG^ 

BAGS #2 

LLQARLMKEES PVVSWRL BPBDGTA LC PI F 
CTGCTCCAGGCTAGGCTCATGAAAGAGGAAAGCCCTGT^ 

gpl00In4 #3 

VYF F L P D H LS FGR PFH LN FCDFLAA 
PRAMB #18 

PYLGQMINLRRLLLSHIHASSYI SPEKBEQ 
CCCTATCTGGGACAGATGATCAATCTGAGAAGGCTCCTGC^ 

MAGB-3 #3 

QAPATBBQBAASSSSTLVBVTLGBVPAABS 
CAGGCrCCCGCTACCGAAGAGCAAGAGGCTGCCTCCAGCT^^ 

PRAME #6 

QAWPFTCLP LGVLMKGQHLHLBTFKAVLDG 
CAGGCTTOGCCTTrCACATGCCTCCCCC^^ 

PRAME #12 

I. S T E A E Q P F IPVBVLVDLFLKBGACDBLFS 
CTTGTCCACCGAAGCCGAACAGCCTTTCATTCCCG 

NYNSOlb #3 

A B V ^-JLJl -° G0QGPRGRBBAPR G V RMAARLQG 
GCCGAAG1 ^ CTC ^^ 

LAGB1 #5 

° A P R G - P " GGAASA Q DGRCp C<5ARRPDSRLL 
GGCGCItXX»GAGGCCCTCACGGAG^ 

LAC El #4 

GPRGAGAARASGPRGGA PRGPHGGAASAQD 
GGCCCTAGGGGAGCCGGAGCOGCTAGGGCTAGCGGACXrCAGA 

PRAME #3 

LAGQSLLKDEALA IAALBLLPRE L F PPLFM 
CTGGCTGGCCAAAGCCItXTGAAAGAOGAAGCCCTCGCCATTGCCG 



GAGB-1 #4 

B P A TQRQOPAAAQBGEDBGASAGQG PKPBA 
GAGCCTGCCACACAGAGACAGGATCCCGCTGOOGCTCAGGAAGG 

PRAMB #11 

P B A A Q PMTKKRKVDGLSTBABQPFI PVBVL 
CCCGAAGCCGCTCAGCCTATGACAAAGAAAAGGAAAGTGGATGG 
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LAGBl 86 

GRCPCGARRPDSRLLQLH ITMPFS S P M E A B 
GGCAC^TGCCCTTGCGGAGCCAGAAGGCCTG^ 



LAGK1 #9 

PGAVLKDFTVSGNLLFI RLTAADHRQLQLS 

CCCGGAGCCGTCCTGAAAGACTTTACCGTCAGCGGW 

PRAMB #31 

BDIHGTLHLERLAYLHARLRBLLCBLGRPS 
GAGGATATCCATGGCACACTGCATCTGGAAAGGCTCGCCTATCTGC^ 

GAGE- 1 #6 

DSQBQGHPQTGCB CEDG PD6QBHD PPNPBE 

GACTCCCAGGAACAGGGACACCXTTCAGACAGGCTG^ 

TRP2IN2 #3 

FVIGLRVWQWEVI SCKLI KRATTRQPAA 
TTCGTCATCGGACTGAGAGTGTGGCAGTGGGAGGTCATCT 

LAGBl #2 

DADGPGGPGI PDGPGGNAGGPGBAGATGGR 
GACGCTGACGGACCCGGAGGCCCTGGCATTCCCGATGG 

MAGB-1 #12 

PTGHSYVLVTCLGLSYDGLLGDNQIMPKTG 
CCCACAGGCCATAGCTATGTGCTCGTGACATGCCTCGG 

MAGE -3 #9 

FLLLKYRAREPVTKAEMLGSVVGNWQYFFP 
TTCCTCXrKXrrCAAGTATAGGGCTAGGGAACCCGTC^ 

GAGE-1 #9 

TGILWLLHNHCFLNLSPRKPAA 
A^CGGAATCCTCTGGCTCCTGATGAAOU ^TTCCTTT CTGAA 

MAGE -3 #8 

BFQAALSRKVAELVHFLLLKYRARE PVTKA 
GAGTTTCAGGCTGCCCTCAGCAGAAAGGTCGCCGAACTGGTCCA 

MAGB-1 #18 

VP DSDPARYEFLWGPRALABTSYVKVLBYV 
GTGCCTGACTCCGACCCTGCCAGATACGAATTCCTCTGG^ 

NYNSOla #6 

GCCRCGARGPESRLLBFYLAMPFATPMEAE 
GGCTGTTGCAGATGCGGAGCCAGAGGCCCTGAGTCCAG 

MAGE -3 #13 

ATCLGLSYDGLLGDNQIMPKAGLLI I V L A I 
GCCACATGCCTCGGCCTCAGCTATGACGGACTGCTC^ 

LAGBl #3 

GNAGG PGEAGATGGiRG PRGAGAARA SG PRG 



Artificial Protein: 



APBBBIWBBLSVMBVYDGREHSAYGBPRKLEEVPTAGSTPPP^ 
LI IVLAI ITkREGDCAPBBKIHBLOVTjDLRXNSHODFVTIVW 
BVPGAQGQQGPRGRQSPSVSQLSVI^LSGVMLT^ 

FRA VI TAAMAARA VFIAIjSAQLLOARLMKBES PWSTFYD PB P I LC PCFM PN AA.I BLKBVD PI GHLYI PATCLGLSYDGLLGDNRRYVE PPEMIGPMR 
PEQFSDEVEPATPBEGBAGGFFPW1JCVYYYRPVI GUI VWOWBVI SCAAMBRRRLWGS I QSR YI SMSVWTS PRRLVEAAIJ4ETHLSSKRYTEEAGGFF P 
WLKVYYYRAAMS LEQRSLHCKPEEAIJ2AQQEALGLVCVQAATS SS S PLVLGTLBEVPTAG STD P PQS PALELLPRELF P PLFMAAFDGRH S QTLKAMV 
BI^VLEVFEGREDSILGDPKIULLTQHFVQB^ 

GTGGSTGDADGPGGPGI PDGPGQCFLPVFIiAQPPSGQRRAATWGBGLPSQPI IHTCVYPFLPDHLS PGR PPSTSCI LBSLFRAVI TKJCVADLVGFLLL 
KYRAAMOABGRGTGGSTGDADGPGGPGI PDGPGDGPDGQBMD PPNPBBVKTPEEEMRSHYVAQI SSCLQQI^IJ>!W ITQCFLPVFLAQPPSGQERASA 
TLQDLVFDEGGITIHX)IiIALI»PSLSLGDPiaa^ 

ABIARRSIAODAPPIJWEAPRGVRMAARLQGAAHRI^ FAABQFSDEVBPATPBEGEP ATQRQED P AAAQBGTMNYPLWSQSYEDS SNQ 
BEEG PSTFPDLBSMQBBEG PSTFPDI<ESBFQAALSRXVAE1>VHVDLFLKEGACDBLFSYLI BKV KJtKKNVLRLTT RLTAADHRQLQLS I S SCLQQLS L 
UWITAAMPLEQRSQHCkPBEGLBARGBAI^LVGADATO 
CRI^EGDVMHI^G^PSVSQLSVLSLSGHY1*BYRGVPGSD 
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LGfll^IJ^ITCCiaa^IFAHPMQDIKM 

SVIYDGIXGDHQIMPICItmaiVLVMIAM^^ 

HVWPLETLSITNCRLSEGDVMHLSRALAETSYVKVl^^ 

MAQGAMLAAQBRRVPRAKNYKHCFPBI PGKASESIX)LVFGIDVICBAim»VBVTLGBVPAABSPDP 
NPCPHCGDRVMLTITVSPEPLQALLERASATLQ^ 

GAAMSWRGRSTYRPR PRR YVBP PBM IG PMR PYI SMSVKTS PRR1*VBI*AGQSLLKDEALAI AYDGREHS AYGE PRXIXTQDI»VQEJCYLEYRQQCFI*FVF 

IAQAPSGQRRAAVKVUIHMVKISGGPHISYPPIJIBWVLRE^ 

IX)LSKMIIJCMVQlJ>SIBDLBVTCTWKI,PTIJUtFSFL 

ARGEALCLVGAQAPATEEQEAASSS S I SYPPLHEWVLREGEEAAHIHASSYIS PEXBEQY I AQFTSQFLSLQCLGHAGG PGEAGATGGRG PRGAGAAR 

ASGPGGGPRGAGAARASGPGGGAPRGPKGGAASGI^QGASAFPTTINFTR^ 

FDGRHStfTXXAMVQAWPPTCLPLGVI^^ 

VAQTCILWLLKNHCFUH^SSCLC^^ 

CFPBIFGKASLVRRII^RI^PI^RPGAVIJCDPTVSGMI^^ 

MPAASNSQKRSFVyVWKTWGEGLPSQPIIHTCLLOARI>!XBE^ 

UJ^IHASSYISPEKEEQOAPATEEQBAASSSSTLVBVTLGBV 

LKEGACDELFSAEVPGAQG<XX3PRGREEAPRGVRMAARIj<X?GAPRG PHGGAASAQDGRC PCGARR PDSRLLG PRGAGAARASG PRGGAPRG PHGGAAS 

AQDLAGQSLIjKDEAIjAIAALEIiLPRELFPPLFMEPATQ^ 

GARRPPSRIJfllJIITMPPSSPMEAEPGAVIJrorTVSGNIJ^ 

CEDGPDGQEMD PPNPBEPVI GLR VWQWEVI SCKLI KRATTRQPAADADGPGGPGI PTCPGGNAGGPGBAGATGGRPTGHS YVLVTCl/SIfS YDGLLGCW 
QIMPKTGFLIaLKYRARE PVTKAEMI^SVVGIIWQYTFPTGI LWUjMNNCFUILS PRKPAAE FtJAALSRK VAELVHPLLIJCYRARE PVTKAVPDSDPAR y 
EFLWG PRAIiABTSYVKVLBYVGCCRCGARGPESRlil^PYLAMP FAT PMEAEATCIjGLSYDG LLGDNQ I M PKAGLLI I VLAI GKAGGPGEAGATGGRG P 
RGAGAARASGPRG 

Artificial DHA: 



GCCCCTGAGGAAGAGATTTGGGAAGAGCTCAGCGTCATGGAA^ 

GCCTACCGCTGGCTCCACCGATCCCCCTC 

ATAGGGCTAGCCTCTACTCCTTCOT 

CTGATTATCGTCCTGGCTATCATTGCCAGAGAGG 

CCAAGACTTTTGGACAGTGTGGAGCGGAAACAGAGCCrCCCT 

GGAAGCTCCAGGTCCTGGA 

CGATGTGTCC CCCGaACTC CTCCACGC^ 

CTAGGTATGAdT rCTGTGGGGCXXITAGGCAACCCTCCGAGGGAAGCT^ 

TTCAGAGCCGT CATCAC^ 

CGTCG TGTCXaili- i-irjiCGA 

TCTACAiri xiA^ rACCTGTC^ 

CCCGAACAGTTTAG^GATGAGGTCGAGCCTGCCA 

GATTGGCCTCAGGGTCTGGCA^ 

CCGTGTGGACCTCCXXXrAGAAGGCrrCGTGGAAG 

"TCGCTCAAGGTCTACTATTACAGAGCCGCTATGTCCCTC 

GGGACTGGTCTGCGTCCAGGCTGCCACA* 

AAAGCXXTGCCCTCGAGCTCCTGC^^ 

GAGCTCAGOGTCCTGGAAGTGTTTGAGGG^ 

GCA^lWlVritJU^TCGATGTGAAA^ 

CCXXXXS^GGCGCTAGCTCCCTGCCTACCACAATG^ 

GGCAC !AGGOGGAAGCACAGGOGATGCOGATG GC CCrGGOG 

CCCTAGCTGACAGAGAAGGGCItXXACCTGGGGCGAAGGCC^ 

GCTTTGGCAGACCCrTTAGCACAAGCTGTAT^^ 

AAATACAGAGCCGCTATGCAAGCCGAAGGCAGAGGCACAG 

AGACGGACCCGATGGCCAAGAGATGGACCCTCCCAATCC^ 

GCTGTCTGCAACAGCTCAGCCTCCTGATGTCiGA 

AO^CTGCAA GAljLft^ WlTltJAOGAATGOGGA^ 

CACCCAACACTrra 

GCCTC GTGTGTGTtX3^ 

GCTGAGCTCGCCAGAAGGTCCCTGGCTCAGGATGCCOT 

TGCCTGGAGACTGGAACCOGAAGACGGAACOGCTOtilUrritA 

CCGAACCOGCTACCCAAAGGCAAGACCCT GC C GC TGCOLAAGAGG 

GAAGAGGAAGGCCCTAGCACATTCCCTGACCTCGA 

TCTGTCCAGGAAAGTGGCTGAGCTCGTGCATGTGGAT^ 

GGAAAAAGAATGTGCTCACGCTCACCATTAGGCTCACOGCTGCCX^ 

CTCATGTGGATCACAGCCGCTATGCCTCTGGAAC 

CGGCGCTGAOGCTGACGGACOCGGAGGCCCTGGCATTCCCGATGGC^ 

ACGAATTCCTCTGGGGACCCAGAGCCCT 

TGTAGGCTCAGCGAAGGCGATGTGATGCACCTCAGCCAA 

ACAGGTCCCCGGAACCG AT CCCGCrft CT ATGA^ 

GCCTCCAGClCt>lG'ITrU ^ ri tJA GCTOT^^ 

AGAGGCAGACTGGATCAGCTCCTGAGACAOGTC^ 

TCTGCAAGCCCTCTACGTCGACTCCCTGT^^ 

AOGTCCTGCTOGCCCAAGAGGTCAGGOCTAGGAGATG 

CTCCAGCAACTOTCCCTGCTCATGTGGA 
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CCC^^AAGCACyVCTGCTCAA^ 

TCCGTGATTTACGATGGCCTCCTGGGAGACAATCAGATTATG^ 
TAGCATTAGCGCTCTGCAAAGCCTCCTGO^^ 
CXSGAAGGCGATTGCGCTCCOSAAGAGAAA^ 
CATGT<»TGAACCCTCTGGAAACCClt»GCATTACXA^ 

TGTGAAAGTGCTCGAGTATGTGATTAAGGTCA^ 1 1 1 rTL ' riTC CCTCCCT G AGAAGC^ 

CCCTa3«nrcrACGAAGACATTCACGG^ 

ATGGCCCAA<X5CGCTATGCTCGCOGCTCA 

AAGCCTCCAGCTCGTGTTTGGCATTGACGTCAAG 

CCCAAAg^CTCAGGGAGCCTCCACXTCT 

AATCCCTGTCCCCATTGCGGAGACAGAGTGATGCTGACAGACGTCAGCCCTGA^ 

TCTGGTCTTCGATGAGTGTGAGGAT^ 

GCGAATGCX^AGAGATGCTGGGAAGCGTOT 

GGAGCCGCTATGTCCTGGAGAGGCAGAAGCA^ 

TAGCATGAGCGTCTGGAOVAGCCCTAGGAG^ 

AGCATAGCGCTTACGGAGAGCCTAGGAAACTGCTCACCCA^ 

CTCGCCCAAGCCOCTAGCGGACA€AGAAGGGCTGCCGTGAAA^^ 

GCATGAGTGGGTGCTCAGGGAAGGCGAA 

ATGCCGCTCCCCTCCCCAGACCCG^^ 

CTXXTTCCTCATraTT^^ 

TCACCTGTACCTGGAAGCTCCCCACACTGGCTAAGTTTAG 
GCCAGAGGCGAAGIXXrrCGGCCTCGTGGGAG^ 
CGAATGGGTCCTCAGAGAGGGAGAGGAAGCCGCTCACAT^ 
CCCAGTTTCTGTCOrreCMTGCCTCG 

GCCTCCGGCCCTGGCGGAGGCCCTWGGGGAGCOGGAGCCX3CTAGGGCT 
CGGACTC^ 

TTCGATGGCAGACaCTCCCAGACACTGAAAGa^ 

TCCTGCCTAGCCTCAGCCATTC 

GTGGCT CAGAC AGGCATTCTGTGCXrrGCTCAT^ 

CCAATGCrTTCTGCCTGTGTTTCTGG^^ 

AAAAGCTCAAGATTTTCGCTATGCCTATGCAAGACAT^ 

reCT TTCCC GAAATCTTTGGCAA^ 

CACAG TGTCCGGCAA TCTGCTCCACTGTAGC^^ 

ATCTGATTGGCCTCATGGTCTGGCTCAGC^ 

AT6CCTCCCGCTAGCT6GAGCCAAAAGAGAAGCTTTGTGTATGTGTG 

GCTCCAGGCTACGCTCATCUUUtfSAGGAAAGC ^ r n v l CTATTTCT 

TTCTGCCrGACCRTCTGTCCITCGGAA 

CTCCTGCTCAGCCATATCCATGCCTCCAGCrATATC^ 

CAGCACACTGGTCGAGGTCACCCTCG^ 

AGCATCTGCATCTGGAAACCTTTAAGGCTGT^GCTCGACGG 

CTCAM3GAAGGCGCTTGCGATGAGCTCTTCTCC 

CAGGATGGCCGCTAGGCTCCAGGGAGGCGCTCC^ 

CCGATAGCAGACTGCrOGGCCCT^ 

GCTCAGGATCTGGCTGGCCAAftGCCTOCTGAAAGACGAAGCCCTCGCCAT^ 

GGAGCCTGCCACACAGRGRCAGGATCCOGCTGCCGCTCA 

CCGCTCAGCCTATtyvCAAAGAAAAGGAAAGTGGA 

GGAGCCAGAAGGCCTGACTCCAGGCT^ 

^CCGTt^GCGGAAACCTCCltriTTATCJUSAC^ 

GGCTCGCCTATCTGCATGCCAGACrGAGAGAGCTCCTC 

TGTGAGGATGGXXCTGACGGACMGAAATGGATCC^^ 

ACTGATTAAGAGAGCX3«3^ACCAGACAGCCTGCCGCTGA 

COSGAGAGGCTGGCGCTACCGGAGGC^^ 

CftAATCATCCCCAAAACCGGATTCCTCCltXTCAA 

AIACTTTTTCCCTACOSGAATOCTCT 

GCAGAAAGGTOGCCGAACTQGTCCA C ' rriVltjC TCC^^ 

GAATTCCTCTGGGGACCCAGAGCCCrCGCCGAAACC^ 

CAGGCTCCTGGA A TlCiATC'lXJ UC TATtXCI 

ACCAAATCATGCCCAAAGCCGGACTGCTCATCA 

AGAGGCGCTGGCGCTGCCAGA«XriXXX5GCCCTA^ 
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Cassettes for construction of a full-length HIV Savine 

Cassette Al 

gga t c ca c c ATGACAGGCCCTTGCACAAACGTCAGCACCGTGCAAT^ 
CCCAACTGCTCCrniAATGGCTCCC^ 

TGACGTCAGGGACACAAAGGAAGCCCTCGACAAAATCGAACT 

TCCAGCTCCTTCAACTTTCCACAAATCACACTGTGGCAAAGGCCTCTC^ 

CCGATATGGTGATTTACCAGTACATGGACGATCT^ 

ACCCGATAAGAAACACCAAAAGGAACCACCATTCCTCTGGATGGGATACGAA 

AGGCTCTGCTCXSACACAGGCTCCrATGGCAGAAA 

TCACCAATACCCTATCTCrGAGCAACCCCTCTCCTTCriT 

GAGTTTTCCAGCGAACAGACAAGAGCCAATAGCTCCGCCrCC^ 

TCATTCTGGGATCreGCACO^AAAaxrCGCT 

GATTGAGGAGGTGCAAAAGAAAAGCGAGCAAAAGACACAACAC^ 

GGAAGGCAAAAGATTATCTCCCTt^CAGAGACAACCAATC^ 

CCACACTGTTTTCCX3CCAGCGATGCCAAA 

CCCCGCTGAC^TACAGTGCTGGAGGAGATGAACCrCCCCGGAAAATGGAAGCCT 

GGATTCATTAAGGTGAGAAAAATCGGACCCGAAAACCCTTAC^ 

CCACCAAATGGAGAAAGCTCGTGGATTTCAGAGTTAGGAT^ 

CTCCGAA<^CTCCAGGCAAACCAGAAAGAATAGGAGAAGGAGATGGGGA 

AGACTGGTCAACGGATTCTTAGCCCrcGCCr^ 

CCGTCTACTATGGCGTCCCCGTCTGGAGAGAGGCTGCCACAACCCTCTTCT 
TGCCATGGCTGGCAGAAGCGGCGGCACAGACGAAGAGCTCCTGAGGGCT 
TCCAACCCITACCCTTCXXXTTACT 
AGAAAGCCAAAGGCTGGTTCTATAGGCATCACTTTGA 

CAAAAAGGAAAAGGTCTACCTATCATGGGTACCAGCCCACAAGGGAATCGGAC^ 

ATTATCAAAATCCAAAACTTTAGGGTCTACTAT^ 

AGGAAATCTGGAACAATATGACATGGATTGAGTGGGAGAGAGAGA^ 

TCTGAAACCCGAACCCACAGCCCCTCCCGCTGAGAATT^^ 

GAGCAAAAGGATAAGGAGCAATACGATCAGATTCITATTGAGATTTC 

TGGGACCTAC«:CTGTGAATATCATTGGCAGAATTTACGAAACCT 

GATCAGAATCCTCCAGCAACrGATGTTTATCCATT^ 

AAAGGTCTCXJGCATTAGCCACGGAAGGAAAAAGAGAAAAC^ 

TGGACCCCAAGCTGGAGCCITGGAAACACCCTGGCTCCC^ 

GTGCCCTAGCGAAGAGACAACCCCTAGCCAGAAACAGGAACAGAAAGACAAAGAACT 

CTCAAGTCCCTGTTTGGCAATGACAATTTCAATATC 

TCTTACTATGGGACCAAAGCCTCAAGCXrrTGCXj^ 

TAAAAACITCAGAAAGTATACCGCTTTCACAA^ 

GGCCAAGTGAATTGCTCACCAGGCATTTGGCAACrGGA 

AGCAATGGCCTCTGACAGAGGAAAAGATTAAGGCI^ 

TAGCATGGATGACCTCTACGTCGGCTCXX^CCTX^ 



FIGURE 30 
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AGATTGGCCAACATAGGACCAAAATCGAAGAGCTCAGGGAAC^ 

CCAAAAGACTGAGCTCCAAGCTATCCATCTGGCT^ 

CCCGCTGAGACTGGTCAAGAGACCGCC1TTTTCATTCT 

CAGACAATGGCAGGACAAAGATTGAGGAACTGAGACCGCATCnX^ 

GCATCAGAAAGAGCCTCCCTTTCTGTCTAGTC^ 

AAGAGACGCAGAGAAAATCACACAATGAATGGCCATACTGCCACAGAG 
AGGAACreCTGGAGCTCGACAAATGGGCAAGCCTCTGGAA 

AGTGTCCCAGAATTACCCTATCGTCCAGAATGTCCAAGGCCAAATGGTCCACCAACCCCT 

GGACTGAGAATCGTTTTCGCTGTGCTCAGCATTATCAATAGGOT 

CCCTCCCCCTCATCCATCTCCAAT^ 

AGTGAGAAGGAGATGCXSAATACGCTGTGGGACTCGG^ 

ATGGGCGCTGCCTCCATGACACTGACAGTGCAAGCCTATGACCCT 

AGGGCCAGGGT(^GTGGACATTTCAGATTTTCCAAGAGCCTT^ 

CGTCAACATCATCX^AAGGAACATGCTGACACAGCTITGGCCG 

GCTATCTTTCAGTCCAGCATGCGACAGATT 

ATCCTAGCCCTCTGACATTCGGATGGTGTTT 

GGGCGAAAACAATTGCCCCCTGTTTAGGAAATACACAGCCr^ 

ATTAGGTATCAGTATAACGTCCTGCCTCAGGGATGGGGAAG^ 

AGGCTAGGCTACTGCTCAGCGGAATCGTCCAGCAA^ 

GCCTOTGCATGGCGTCTACTACGATTCCTCCAAGGATCTGOT 

TCCACC^TGGTGGATATtXXIAAACrACGACCT 

TCATGTTCATTCACTTTAGGATTGGCTGC 

C^GGAAAAAGGGATGCTGGAAGTGTGGCAGAGAGGGACAC^^ 

CTGGGAAAGGATGCCAGACTGGTTATCAAAACCTATTG 

ATGGCGTCAGCATTGAGTGGAGGATAAGGGAAAGGGCTX^ 

GCTCAGCACATTGGTGGACATGGGCAATTACGATCTGTCT 

ATCGAAGAGGAAGGCGGAGAGCAAGGCAGAGGCAGAAGCGTCAGGCTCGTC 

ATGAGGGAGAGAATAACTGTCreCTTCACCCTATCAGTt^^ 

CGATATCAAAGTGGTCCCCAGAAGGAAAGCCAAAATC 

GTGGCCAGCTTCTCTTCCGAGCAAACAGGGGCTAACT 

ACAGACAGGGAACAAGCTCCAGCTGTTTCAATTG 

GGCAAAATCTGGCCCTCCAACAAAGGCAGACCCXX5AAACTTTCT 

TTATCATGATCGTCGGTGGACTGATTGGCCTCAGG 

CCGAGACCTCGATAAACATGGCGCTATTACAAGCTCCAATACrc 

GCTGCTGCCATGACACCCCTGGAGATCATCGCTAT 

GGACAATCGTCTACATTGAGTATGTCGACtgaagatctgaattc 
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A2 fragment 

gga t cca CcATGACAGGCCCTTGCACAAACGTCAGCTC 

CCCAACTGCTCCTGAATGGCrCCCTGAAAAGCCrCT 

TGAGGTCAAGGACACAAAGGAAGCCCTCGACAAAATCGAACTC^ 

TCC^GCTCCATCAACTTTCCACAAATCACACrrc 

CCXSAAATGGTGATTTACCAGTACATGGACGATCTGTAT^ 

ACCCGATAAGAAACACCAAAAGGAAC<^CCATTCCTCTGGATGGGATACGAACTC 

CAGCCTTTTAATTTCCCTCAGATTACCCTCTGGCA^ 

AGGCTCTGCTCGACACAGGCrCCTATGGCAGAAAGA 

TCACCAATACCCTATOTCTGAGCAACCCCTCTCCTTCTTTAGGGAAAAC 

GAGTTTTCCAGCGAACAGACAGGAGCCAATAGCTCCGCCTCCAGGAAGA 

TCATTCTGGGATCTGGCACCAAAAACXX:CGCTACTAGAAGAATCGATGTC 

GATTGAGGAGGAGCAAAACAAAAGCAAGCAAAAGACACAACAGGCTG 

GGAAGGCAAAAGATTATCTCCCTGACAGAGACAACCAATCAGAAAACCG^ 

CXJACACTGTTTTGCGCCAGCGATGCCAAAGCCTAT^ 

CCCCGCTGACGATACAGTGCTGGAGGAGATGAACCTCCC 

GGATTCATTAAGGTGAGAAAGATCX3GAC(XX5AAAACCCTTACAATACCCC^ 

CCACCAAATGGAGAAAGCTrCGTGGATTTCAGAATTAGGAT^ 

CTCCGAAGGCACCAGGCAAACCAGAAAGAATAGGAGAAGGGGATGGG 

AGACTGGTCAACGGATTCTITAGCCCTCGCCTGGGACGATCT^ 

CCGTCTACTATGGCGTCCCCCTIXrrGGAGAGAGGCT 

TGCCATGGCTGGCAGCAGCGGCAGCACAGACGAAGAGCTCC^ 

TCCAACCCITACCCrTCCGCTAGTATGAAAATCAGAACCTGGAAGAGCC 

AGAAAGCCAATGGCTGGTTCTATAGGCATCACTTTGAGGAGTCro 

CAAAAAGGAAAAGGTCTACCrATCATGGGTACCAGCCC^ 

ATTATCAAAATCCAAAACTTTAGGGTCTACTATAGGGATAGCAGAGACC 

AGGAAATCTCGAACAATATGACATGGATTCAGTGGGAGAGAGAG^ 

TCTGAGACCCGAACCCACAGCCCCTCCCGCTGAGAATTTCGGATTC^ 

GAGCCAAAGGATAAGGAGCAATACGATCAGATTATTATTGAGATTTGCG^ 

TGGGACCTACCCCTGTGAATATC^TTGGCAGAATTTACGAAACCT 

GATCAGAATCCTCC7VGCAACTGATGTTTATCCATTTCA 

AAAGGTCTCGGCATTAGCCACGGAAGGAAAAAGAGAAAACAGAGAAGGCGAG 

TGGACCCCAACCTGGAGCCTTGGAAACACCCTGGCTCCCAGCCTA^ 

GTGCCCTAGCGAAGAGACAACCCCTAGCCAGAAACAGGAACAGAAAG^ 

CTCAAGTCCCKnTTGGCAATGACAATTTCAATATG^ 

TCTCACTATGGGACCTyVAGCCTCAAGCCTTGCGTCAAGCTCG 

TAAAAACrTTCAGAAAGTATACCGCTTTCACAAT^ 

GGCC^GTGAATTGCTCACCAGGCATTTGGCAACT^ 

AGCAATC^CTCAGACAGAGGAAAAGATTAAGGCTCTGACTC 

TAGCATGGATGACCTCTACGTCGGCTCCGACCTGGAGATTGGCC^ 



Figure 30 (Cont) 



WO 01/05K)197 



PCT/AU01/00622 



208/216 

CACCTCCTGAGATGGGGACTCACCGAC^ 

ACTCCGGCTTAGAGGTCAACATTGTGACAGACATTCCCGC^ 

ACTGGCTGGCAGATGGCCTGTGAGAATCATTCACACAGACAA^ 

CTGCTCAAATGGGGCTTCACAACCCCTGACAAAAAGCGTCAGA^ 

TGACAGAGGATAAGTGGAACAAACCCCAGAAAATCAAGGGACAC^ 

CACAGAGTCCCAGAATC^GCAAGACAGAAACGAAAAGGAACrGCTGGAGCTCGA 

TGGTTTAACATTACCGACACCGGAAGTAGCTCCCAAGTGTCCCAGAATO 

AAATt^TCCACCAACCCATCTCCCCCAGACTCGTCGGACTGAGAATCATTTTC^ 

GGTCAGGCAAGGCTATAGCCCTCTGTCCTTCCAA^ 

GACTCCACCATTAGGAGAGCCATCCTTGGACACAGAGTGAGCAGGAGATC 

TGTTCCTTGGCTTTCrGGGTGCOKrrGGCT 

CCCTAGCAAAGACCTCATTGCTGAGATTCAGAAACA 

TTCAAAAACGGAACCXTTCCTGGTCXXXJCCT^ 

GCACCCTCAACTTTCCC7VTTAGCAAAGGCAGCCCTGCT 

TAGGAAACAAAACCCTGACATGGTCATCTATCAGTATCCTAGCCCTCT^ 

CCCGTGGACCCCAGCGAAGTGGAAGAGACCAACAAGGGCGAAAACAATTGCCTCCTG 

TTACCATTCCCTCCACCAATAACXJAAACCCCTG 

CACAATGGGAGCCGCCAGCATGACCCTCACCGTCCAGGCrAGGCAAOT 

AATCTGCTGGAGGAGAATAGGGAAATCCTCAAAGAGCCTGTGCATGGCGT 

TCGCTGAAATCCAAAAGCAAGGCACAGAGGAACrcTCCGCCrTGGTGGA 

CAATAACCTCXX!CGCTATTAGAATCCTGCAACAGCTC^TGTT 

ATTGGCATCATCCGTCAGAGAAGGGCCAGAGCTCCCAC^ 

AGATGAAGGATTGCACTGAGAGACAGGCTAACTTTCTGGGAAAGGAT^ 

ACTGCATACCGGTGAGAGAGACTOGCACCTCGGCCATGG 

GATAGCGGCAACX5AAAGCGAAGGCGAC^VGAGAAGAGCTCAG 

GCCCTGCCCCCAGGGGACCCGATAGGCTGGAGAGAATOT 

CAGGCTCGT6AATGGCAGTGAGGGKX3AGGAAGTCA 

CATGGCATGGAAGAOSAAGACAGAGAGGTCAATAGCGATA 

GGGATTACGGAAAGCAAATGGCTGACGATGACTGTC 

TGCAAGCAGAAAGCTGGGAGACGGAGGCGGAGCCGAOK^CAGG 

GAGGGACACATTGCCAAAAGCTGTAGGGCCCCTCGCAAGAAA 

TGAAAGACTtn'ACCGAAAGGCAAGCCAATTTCCTCGGCAAAATCT^ 

TCTCCAAAGCAAATGGCTCTGGTATATCAAAATCTTTATC^ 

TTTGCCGTCCTGTCCATCATTAACGGGGCCGTC 

CCGCTGCCAATAACCCTGACTGTGTCIXKXrrc^ 

CCTTATCGTCGCCXrTCATCATAGCCATTGTGGTCTGGAC^ c tgaa 

ttc 
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Bl fragment 

gg a t c c a c cATGCTCGAGAATATGCTCACCCAAATCGGATG CAC^CTCAATTTCCCT ATCTCCCCCATTG AG ACAG 

TGCCTGTGAAACTGAAACCCGGAATGGATGGCGCCGCCACC^^ 

GAGAAGCGAACTGTATAAGTATAAGGTCGTGAAGATTAAGC^ 

GTCAACACACCCCCACTGGTCAAGCTATGGTATCAGCT 

TCAACACGATGCTGAATCTTGTAGGAGGCCATCAGGCCG 

CTCTCTCCrroTTTCTGGATGGCATTGAO^ 

GCCAACGACTTTAATCTGATGAAGCATCrrCX5TCTGGGCCTCT 

TGCIX^AGACATCCXSAAGGCTGTCAGCAAATTGCTGAG 

TGTCAAAACC^TTATCGTCCAACTCAACGAAAGCGTCGAGATTAACATG^ 

GGCAAGCrTCGACGCCKjGGAAAAGATTAGGCrCAGC^ 

GCCTGGAGGGACTGGTTTACTCCAAAAAGAGGCAAGACATTCT 

TAGATGGGGAACCATGATCCTCGGCTTGGTGATTATCTGTAGCGC 

GGAGTGCCnX3TGTGGAGGAGACAGCTCCTGTCCGGCATT^ 

CCCAACAGCATCTGCTCCAGCTCACCGTC^ 

CATCTATGAGACATACGGAGACACATGGGCGGGAGTGGAAGCCCrrcAC^ 

CCTCCCCTCCCATCCGTG7^AA7VAGCrrCACCGAAGACAGATGGAATC 

GGATTATCGATATCATTGCATCCGACATTCAGACTAAGGAACTGCAAA 

TGTGTTTATCCATAACTTTAAGAGGAAGGGAGGCATTGGCGGCT 

GCCACCGATATCATTCCXX3TGGGCGAAATCTATAAGAGATGGAT 

ATCTACCCGTCAGCATTCTGGATATCAGAGTGAGACAGGC^ 

TCCCAGAGGCCCTGACAGACTCGGAGGCATTGAGG 

CCTCTGCCTC^GACAAGGGGAGACAATCCCACAGACCCT 

TGAATAAGGAACTGAAAAAGATTATCGGACAGGTCAGGGACC^ 

TGCCATGCAGATGCTCAAGGATACC^TTAACGAAGAGGCTTC 

CCCGTTCCC*:CTCTCACCGAGATTTCT 

CCTATAACACACCCATCTTTGCCATTCAAGTGAGAGAGC^ 

CTTCATTCACAATTTCAAAAGGAGAGGCGGAA 

GACTTTAGGGAGCTCAACAAACGTACACAGG^ 

ACCTCAGGAGCCTGTGTCItnTCTVGCTATCACAGA 

TAGCAGAATCGGCATCACTAGGCAACGTAGAGGTAGGAACGGCG^ 

GACCCCATTCCX!ATTCACTATTGCGCTCCCGCTGGCT^ 

CCTCGCCGATCAGCCTAGCCTCTATCCTC 

GCCGCTAGAAGGGCTATCCTCX5GCCATATAGTCAGGAG 

CCCTCC^TACCTCGCACrCAGTCAACCCACAACCXSCTTC 

TCAGGTCIXXTTTCCTGAAGAAGGGACTGGGAATCAG 

AGCAGGCAAGACGAAGACGCAGCCAAGTACCATAGCAATTGGAG^ 

TCGTCCCTAAGGAAATCXn'CGCAAATTGCAATAAGTGT^ 

TGAAGCCGTGAGACACTTTCCCAGACCCTGGCTGCATG 
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TCCCTGAAACCCrGTGTGAAACTGACACCCCTCrGC^ 
ACTCCACCCAAGTGGACCCCGATCTGGCTGACCAACTGATTC^ 

AATCCATCCCATCGGCCAACAC6GAATGGAGGATGAGGATAGGGAAGTG CTGAAATGGAAATTCGATAG CCATCTG 
GCTCTCAGGCATATCGCTTCTAGTCCTATCGATACCGTCCrc 
TGAAACACTGGCCCCTCACCGAAGAGAAAATCAAAGCCATTTGGCCTAG 
GCAGTCCAGGCCTGAGCCTACCGCArc^ 

TTTTACAGACACCATTACGATAGCCGACACCCrAAGGTCAGCrCCGAGGTCC^ 

CTTGCCAAGGCGTCGGCGGACCCAGTCACAAAGCCAGGGTACIt^ 

CATTCCTCCCATTGTGGCCAAAGAGATTGTGGCAAACTC 

CAGGTGAACTGTAGCCCTTCCGAGGGAACAAGACAGACTAGG^ 

GGCAAATCCACTCCATCTCCGAGAGGATTCnXSGGACAGATC 

AAGCA»CTGCAAGAGCAAATCGCATGGATGACAAGCAATC 

AACCCTCAGTCCCAGGGCGTCGTGKSAAAGCATCAA^^ 

ATCTCTGGGTCTACCATACCCAAGGCTATCT 

TAGCAGAGAAAGACAGAGACAGATTCATTCTATTAACGAATGGATTCT 
CCTGTGCCTCraCAACTGTATAAGACA^ 

CACTGCTCGTGCAAAACGCTAACCCTGACTGTGAGAGAGTGTA 

CGGAAAOTAACAGGTGGACAAACltXSTCAGCGCTGGCATTAGGAAAACAGACCCT 

GAAAACGTCACOIAGAACTTTAAC^TGTGGAAAAACGATATGGTG^ 

TGAAATGCAATAACAAAAGGTTCAACGGAACTGGACCCAGTAA^^ 

AGAGCTCAAGAATAGCXSCTATCTCCCTGCTCAACGCT^ 

GAAGTGGTTCAGTCCCGGCATCCCAAAGTGTCC^ 

GGACATACTGGGGCCTCCACACAGGCXJCTGCTATGGGCGGTAAATC^ 

AGTGAGAGAGAGAATCAGACAGACACCCCCTGCCGCTGAGGGAGTG 

GGTGCX!CZATA(XIAATGACGTCAAGCAACTGACAGAGGCTGT^ 

TGAAATACTGGGGGAATCTGCTCCAGTACTGGGGCCAGGAA 

AGCCATTGAGCTGCCTGAGAAAGAAAGCTGGACCGTCAACGATATCC^ 

TCCCAGATTTACCCCGGAAGAGCCATTGAGGCTCAGC^ 

TGCAAGCCAGAGTGCTCGCCATTGAGAGATACOTCGCC^ 

TAGCCAATACX3CTCTAGGCATCATTCAGGCTC 

ATTTACAAGATCCTCACCGAATCTCAAAATCAACAGGATAGGM 

AGAGAAGGGTCGTGCAAAGGGAAAAGCGTGCCGTCGGCATTGGCX5 

ACCCAAAATGATCGGAGGCATTGGAGGCTTTATCAAAGTC^GG 

AACAAGGCTATCTCCTACCATAGGCTCAGGGATTTCATTCT^ 

GCTCCCTCAAAGGCCTCCAGAGAGGCACACTGAATC 

AGTGATTCCCATGTTTTCCGCTCTGTCCGAGGGAGCCACACT 
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B2 fragment 

ggatccaccATGCTOSAGAATATGCTCACC^ 
TGCCTGTGAAACTGAAACCCGG^ 

GAGAAGCGAACTGTATAAGTATAAGGTCGTGAAGATTAAGCCTCTGGGAA^ 

CTTCAACACACCCCCACTGGTCAAGCTATGGTAT^ 

TCAACACGATGCCXtfUVTACTGTAGGAGGCCATCATC 

CGCTGTCCTGTTTCTGGATGGCATTAACAAAGCTCAA 

GCCAACGACTTTAATCTGATGAAGCATCTCGTCTGGGCCT 

TGCTGGAGACATCCGAAGGCTGTAAGCAAATTGCT^ 

TCTCAAAACCATTATCGTCCACCTCAAC^^ 

GGCAAGCTGGACGCCTGGGAAAAGATTAGGCTC^ 

GCCTGGACGGACTGATTTACTCCCAAAAGAGGCAAGACATT^ 

TAGATGGGGAACCTTGATCCTCGGCTTGGTGATTATCTGTAGC^ 

GGAGTGCCTGTGTGGAGGAC&CAGCTCCTGTCTC 

CCCAACAGCATCItXrrCXIAGCTCACCGTCTGGGT 

CATCTATGAGACATACGGAGACACATGGTCX^AGTGGAAGCCCTCAAA 
CCTCCCCTCCCATCCGTGAAAAAGCrCAC 

GGATTGTCGATATCATTGCAACCGACATTCAGACTAAGGAACTGCAT^AACCAA 

TGTGTTTATCCATAACTTTAAGAGGAAGGGAGGCATTGGCGGCTACTCCG 

GC<^GCGATATCGTTCCCGTGGGCGATATCTATAAGAGATGGATCATTCT 

ATTCACCCGTCAGCATTCTGGATATCAGAGTGAGAC^ 

TCCCAGAGGCCCTGACAGACTCGAACXX!ATTGAGGAAGAGTC 

CCTCItTrCTCAGACAAGGGGAGACTUVrcCC^ 

TGAATAAGGAACTXSAAAAAGATTATCGGACAGGTCAGGGAC^ 

TGCCATGCAGATGCTCAAGGATACCATTAACX3AAG 

CCCATTGCCCCTCTCACCGAGATTTGTAAAGAAATGGAAA 

CCTATAACACACCCGTCTTTGCCATTCAAGTG 

CITCATTOVCAATTTCAAAAGGAAAGGCGGAATCGG^ 

GACTTTAGGGAGCTCAACAAACTTrACACAGGATTO 

ACCTCAGGAGCCTGTGTCTGTTCAGCTATC^ 

TAGCAGAATCGGCATCACTAGGCAACGTAGAGGTAGGAACXKSCTCCT 
GACCCCATTCCCATTCACTATTGCGCTCCCGCTGGC^ 
AAAAGGATTGGCATCTGGGACAGGGAGTGTCCATCGAATGGAG 
CXTTCGCCX^TCAGCCTAGCCTCTATC 

GCCGCTAGAAGGGCTATCCTCGGCXIAAATAGTCAGGAGAAGGTGTC 

CCCTGCAATACCTTGCACTCAGCCAACCCAAAACCGCT 

TCAGGTCTGCTTCCTGAAGAAGGGACTGGGAATCAGGG^ 

AGCAGGCAAGACGAAGACGCAGCCAAGTACCATAGCAATTGGAG^ 

TCGTCGCTAAGGAAATCGTCGCAAGTTGTGAT 

TGAAGCCGTGAGACACTTTCCCAGACCCTGGCTGCATGGCCT 
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TCCCTGAAACCCTGTGTGAAACTGACACCCCTC 

ACTCCACCCAAGTGGACCCCGATCTGGCTGACCATCTGAT^ 

AATCCATCCCATGGGCCTACACGGAATGGAGGATGAGGAAAGGGAAGTGCTC 

GCTCTCAGGCATATCGCTTCTAGTCCTrATCGATACCGTCCCCGTCAAGCT 

TGAAACAGTGGCCCCTCACCGAAGAGAAAATCAAAGCCAT^ 

GCAGTCCAGGCCTGAGCCTACOSCACCCCCTVGCCGAGAACri^ 

TTTTACAGACACCATTACGAAAGCCAACACCCTAAGGTCAGCT 

CTTGCCAAGGCGTCGGCGGACCCAGTCACAAAGCCAGGC^ 

CATTCCTCCCATTGTGCCCAAAGAGATTGTGGCAAA^^ 

CAGGTGGACTGTAGCCCTTCCGAGGGATCAAGACAGGCTAGGAAGAACA(^ 

GGCAAATCCGCGCCATCTCCGAGTGGATTCTGGGACAGATAAGGGAACCC^ 

AAGCACACTXSCAAGAGCAAATCGCATGGATGACAAACAATC^ 

AACCCTCAGTCCCAGGGCGTCGTGGAAAGCATGAACA 

ATCTCTGGGTCTACAATACCCAAGGCTTTTTCCCTGACTGG^ 

TAGCAGAGCAAGACAGAGACAGATTCATGCTATTAGCGAAAGGATTCT 

CCTGTGCCTCTtKIAACTGTATAAGACAC^ 

CACTGCTCGTGCMAACGCAAACCCTGACTGTGAGAAAGTC 

CGGAAACGAACAGGTGGACAAACTGGTCAGCGCTGGCATTAGGA 

GAAAACGTC^CCXSAGAACTTTAACATGTGGAAAAACA^ 

TGAAATGCAATAACAAAAAGTTCAACGGAACTGGACCCTGTAAGAATGTGTCC^ 

AGAGCTCAAGAATAGCGCTGTCTCCCroCrcAA 

GAAGTGGTTCAGTCCCAGCATCCCZAAAGTGTCCAGCXaAA^ 

AGACATACTGGGGCCTCCACACAGGCGCTGCTATGGGCGGTAA^ 

AGTGAGAGAGAGAATCAGACAGACACXX:CCTGCCGCTGAGGGAGTGCT 

AGTGCCCATACXIAATGACGTCAAGCAACTGACAGAGGTTGT^ 

TGAAATACTTGTGGAATCTGCTCCrrc 

AGCCATTGTGCTGCCTGAGAAAGAAGGCTGGACCGTCJ^ 

TCCCAGATTTACGCCX^AAGAGCCATTGAGGCT 

TGCAAGCCAGAGTGCTCGCCATTGAGAGATACCTC^ 

TAGCCAATACGCTCTAGGCATCATTCAGGCI^^ 

ATTTACAAGATCCTCACCGAATCTCAAAATCAACAGGAT^ 

AGAGAAGGGTCGTGCAAAGGGAAAAGCGTGCCXmXSGCATT^ 

ACCCAAAATGATCX5GAGGCATTGGAGGCTTTATCAAACT 

CAGAAGGCTATCTCCTACCATAGGCTCAGGGATTTCATTC^ 

GCTCCCTGAGAGGCCTCCGGAGAGGCACACTGAATGCCT 

AGTGATTCCCATGTTTACCGCTCTGTCCGAGGGAGCCACACTCGAG tgaagatctgaattc 
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CI fragment 

gg a t c c a c cATGCTCGAGAGCAACACACCCXX:^ 

AAGTGGGATTTCCTGT6AGACCCCAAGT6CCTAGAGCTTGGAGGGCTATCCT 

AGGCTTTGAGAGAGCCCTCCTAGCCGCCGAATGGGACAGGG 

CAAATGAGAGAGCCCAGAGGAAGCGATATCGCTGGCACAACCCTC^ 

TCAGCTTGTTTCTGAAAGAGAAAGGC^ACTGGAAGGCCTCATOT 

CX?AAGACCAAAGCCCTCAGAGAGAGCCTTACAATGAGTGGACCCTGGAGCT 

CAAGGCCAATGGACCTACCAAATCTTTCAGGAACC 

GCGCTCACACAAACTGGATGACAGAAACCCTCCTGGTCCAGAATGCCAATCCC 

TCTGGGAACCGGAGCCACACTGGAAGAGCCTGAGGT^ 

CAAGACCTGAATACGATGCTCAACATCGTCAGCGC^ 

ATAACCCTCCCATCCCTGTCGGAGAGATTTACAAAAGGTGGATTATCCT 

CGGCCTCAAGAAAAAGAAAAGCGTCACCXSTCCItX^TGTG 

CAAAAGGAAACCTOGGAGGCTTGGTGGACXjGAATACTGGC^ 

CCCCTCCCCTCGTGTTTCCCGATTGGCATAA^^ 

GTGCTTTAAGCTCGTGCCTGTGGACCCCAAACTGTGCT^ 

TTTTACX3TGGACGGAGCCGCCAACAGAGAGACAAAGCTCGGCC^ 

TTAGCCCCAGGACCCTCAACXXTITGGGT^ 

CTGGGCTACCCATGCXrTGTGTGC^TACCGATCCCAATCCCCAAC^ 

GATCAGAAACTCCTCGGCATTTGGGGATGCTCCGGCAAAATCAT^ 

GGTCCAAC?CAAGCTGGCCATAACAAAGTGGGAAGCCTCXZA 

AATCAAACCCCCTCTGCCTAGCGTTAAGACAATCAT^ 

CCTAACAATAACACAAGGAAAGCCGCaKrrAGTGAAGTACX^ 

CCGATACAGGCX1ACTCCAGCCAGGTCAGCCAAAACT 

CGCTTGTTGGTGGGCCAATATCAAACAGGAGTTTGGAA^ 

GGCGCTGCCAATAGGGAAATCCAACTGGGAAAGGCGGGCTAT^ 

TGAGGCTGAGGTCGGCAATGAGCAAGTGGATAAGCTCGTGM 

ATCAATAAGGCTCAGGAAGAGCACGAAGTCAGGGAAAGGATTAGGCG 

CTGTCTCCCAGGATCTGGATAAGTACXX3AGCCCTCACCT 

TGGCGTCGGCAACCCTCAGATTTTGGGAGAGTCCAGCGTTGTCCT 

ACCCCTAAGTTTAAGTTCCCCATTCAGAAAG 

ACAGACTGATCAGCTGTAAC^CAAGCGTTATCAAACAGGCT^ 

TTACTGTGCCCCTCCTAGCTGGATGGGCTATGA 

GAAAAGGACTCCTGGACAGTGAATGACATTCAGAAATC^^ 

AAATGATGACAGCATGTCAGGGAGTGGGAGGCCCTGGCCATAAGGCTAGAGTCT 

CATTTGGAAAGGCCCTGCCAAACTGCTCTGGAAAGGCGAAG 

CAACTGATAGAACKXXnXICTGGATACAGGAGCCGATGACACCCT 

GAATCAAACAGCTCXAGGCTAGGGTCCTGGCT 

CTGTAGCGGAAAGGCTGCTATGGAAAACJU3ATGGCAAGTGA 
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ACATGGAATAGCCTCGTGAAACACCATATGTATATT^ 

TGTGTATATCGAATACAAGAAACTGCTCAGGCAAAGGAGAATCGATAGG 

CTCGAAACCGCrGAGGGATGTAAACAGATCCTGGAACAGCTCCAG^ 

CTAGTAGAAAGCTCCTGAAACAGAGAAAGATTGACAGACTGATTGAGAGA^ 

CAATGAGTCCX5AGGGAGACACACCCGGAATCAGATACCAATACAATGTG 

CCCATTTTCCAAAGCTCCATGACCC7U\ATCCT 

AGTGCTTCAACTGTGGAAAGGAAGGCCATCTCGCTA 

CTCX^GAGGATAGTOACACCTCCGGCACACAGCAAAGCCAAGGCACA 

GTGGC^GCX^ATATATCGAAGCCGAAGTGATCCCTO 

ttaagcctgtggtc^gcacacagctcckxrrc^ 

ctttaccgataacaaactggtcggcaaactgaattgggc^ 

tgtaagctcctgagaggcaccaaagccctcactcctct^ 

atgtgaatgctgctcaaaccagaggcgataaccctaccggt^ 

agagacagacccttgtgacgccgccx:ctagctccaaci^ 

cccxrctctggaaaggctccacctcgactgtagcgaagact^ 

GGTTCAATATCACCAACTGGCTGTGGTACATTAAGATTTTCATO 

GTACTCACCTGTCTCCATCCTCGACATTAA 

AAGCTCCKrTCGAAGGGAGAGGGAGCCGTCGTGATTCAGGA 

CTAAGATTATCGAACTGAATAAGAGAACCCAAGACTTTTGTGAAGTC 

GAAGAAGAAAAAGTCAGTGACAGTGGCCGCTATGAGAGTGAA^ 

TGGGGCACAATGATTCTGGGACTGGTCATCATTTGCTCCGCCT 

GGGGTACAAAGGCTCTGACAGAGATTGTGACACTGACAGAGGAA^ 

CTCCCGCCTCGCCCTGAGACATATCX^CAGGGAACTGCATC 

CTGGGACGCTCCAGCCTCAAGGGACTGCAAAGGGGATGOT 

GGGGCTCTAGCCTGGGGCAACTGCAACCrGCTCTGA 

CX3CTACCCTCTGGTGTGTGCATCAGGAGCTCTA 

ACCAGAGCC^UVAAGGAGAGTGGTCGAGAGAGAGAAAAGGCT^ 

TGGAGCTGGAGGAAAACAGAGAGATTCTGAGGGAACCCGT 

CCAAGTCAACAATGCCAACATCATGATGCAGAGAGGC^ 

GAGGAGGTCGGCTTCCCCGTCAGGCCCCAGGTCCXIACTGAGACCT 

TCTTCAGACAGGGACX^CAAAGAGCCTTTCAGAGACTAT^ 

CTCACAGGAAGTGAAAAACTGGGAGAAAATCAGACTGA 

GTGTGGGCCTCCAGGGAACTGGAAAGGTTTGCCTCCCAG 

CCGAGTCCGAGCTCGTGAATCAGATTATCGAAGAGCTCAl^^ 

C^TTGAGGTCGACCAAAGGGCrTGGAGAGCCATTCTGA^ 

AGGTGGCCCGTCAGGACAATCTATACCGATAACGGAAGC^ 

GGGCTGATGTGAAACAGCTCACCGCAGTCGTC^ 

CAAGTTCAGACTGCCTATCGCTGCCGCCAGCAACGA 
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C2 fragment 

gga t C ca CCATGCTCGAGAGCAACACAGCC^CTAACAATACCGATTGCGTGTGGCTGAAAG 

AAGTGGGATTTCCTGTGAGACCCCAAGTGCCTAGA^ 

AGGCCTTGAGAGAGCCCTCCTAGCCGCCGAATGGGATAGGATTCACCCnX^ 

CAAATGAGAGAGCCCAGGGGAAGCGATATCGCTGGCACAACCCTCAGGCCCA 

TCAGCTTGTTTCTGAAAGAGAAAGGCGGACTGGA 

CGAAGACCAAAGCTCTCAGAGAGAGCCTTACAATGAGTGGACCC^ 

CAAGGCCAATGGACCTTCCAAATCTTTCAGGAACCCTTTA 

GCGCTCACACAAACItXSATGACAGATACCCT 

TCTGGGACCCG^AGCCTCACTGGAAGAGCCTC 

CAAGACCTGAATATGATGCTCAACACCGTCC&r 

ATAACCCrCCXATCCCTGTCGGAGAGATTTACA^ 

CGGCCTCAAGAAAAAGAAAAGCGTCACCGTCCTGGATGTGGGAGACGCTT 

CAAAGGGAAACCTGGGAGGCTTGGTGGATGGAATACIW 

CCCCTCCCCTCGTGTTTCCC^TTGGCAAAACTATA 

GTGCTTTAAGCTCGTGCCrrGTGGACCC^ 

TTTTACGCXXIACGGAGCCGCCAACAGAGAGACAA^ 

TTAGCCCCAGGACCCTCAACGCTTGGGTCAAGGTCATCGAAGAGAAAGGCTTTA 

CTGGGCTACCCATGCCTGTGTGCCTACCGATCCCAATCCCCAAGAG 

GATCAGAAACTCCTCGGCATTTGGGGATGCTC 

GGTCCAACCCAGCTGGCCATAACAAAGTGGGAAGCCTC 

AATCAAACCCCCTCTGCCTAGCX^ 

CCTAACAATAACACAAGGACAGCCGCCGCTAGTGAAGTACAG 

CCGATACAGGCAGCTCCAGCAAGGTCAGCCAAAACrATC 

CGCTTGTTGGTGGGCCAATATCAAACACKaAGTT^ 

GGCGCTGCCAATAGGGAAACC^UVACTOGGAAAGGC^ 

GAATCTGGCAGCTCGACTGTACCCATCTGAAAGGCA^ 

TGAGGCTGAGGTCGGCAATGAGCAAGTGGATJU^^ 

ATCGATAAGGCTCAGGAAGAGCACGAAGTCAGGGAAAGC^ 

CTGTCTCCCAGGATCTGGATAAGTACGGAGCCAT 

TGGOH'CGGCAACCCTt^GATTTTGGGAGA^ 

ACCCCTAAGTTTAAGCTCCCCATTCAGAAAGAGACATGGGAAACCTG^^ 

ACAGACTGATCAGCTGTAACACAAGCGTTATCACACAGGC^^ 

TTACTGTGCCCCTCCTAGCTGGATGGGCTATGAGCTC 

GAAAAGGAGTCCTGGACAGTGAATGACATTCAGAAAACAATTC^ 

AAAATATGACAGCATGTCAGGGAGTGGGAGGCCCTGGCCATAAGGCT 

CATTTGGAAAGGCCCTGCCAAACrGCTCTGGA 

CAACTGAAAGAAGCCCTCCTGGATACAGGAGC^ 



CTGTAGCGGAAAGGCIT3CTATGGAAAACAGATGGCAAGTG 
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ACATGGAATAGCCTCGTGAAACACCATA 

ATAAGTCCTTCGAAGAGATTTGGAATAACATGACCTGGATTGAAT^ 
TGTGTTTATCGAATACAAGAAACTGCTCAGGCAAAGGAA^ 
CTGGAAACCGCTGAGGGATGTT^AACAGATCCTGGAACAGCTCCAGC^ 
CTAGTAGAAAGCTCCTGAGACAGAGAAAGATTGACAGACriX^TTGAGAG 
CAATGAGTCCGAGGGAGACACACCCGGAATCAGATACCAATAC7VATGTGCT 
GCCATTTTCGAAAGCTCCATGACCAAAATC 
i AGTGCTTCAACTGTGGAAAGGAAGGCCATCTCX3CTAGGAA1TGCA 

CTCCGAGGATAGCGACACCTCCGGCACACAGCAAAGCCTVAGGC^CAGAGACAG 

GTGGCCAGCGGATATATCGAAGCCGAAGTGATCCC^ 

TTAAGCCntSTGGTC^GCACACAGCTCCTGCrCAACGGTAGCCT 

CTTTACCAATAACAAACTGGrrCGGCAAACTGAATTGGGC^ 

TGTAAGCTCCTGAGAGGCACCAAAGCCCTCACCC 

ATGTGAATGCIXXrrCAACCCAGAGGCGATAACCCTACCGATC 

AGAGACAGACCCTTTTGAOXrCGCCCCTAGCT 

CrcCCTCTGGAAAGGCTCCACCTCGAC^ 

GGTTCAATATCACCAACTGGCTGTGGTACATTAAGATTTTC^ 

GTACCAACCTGTCTCCATCCTCGACATTAAGCAAGGCCCTAAGGAAC^ 

AAGCTCCTGTGGAAGGGTVGAGGGAGCCGTCGTGATTCAGGACAACT^ 

CTAAGATTATCX3AACTGAATAAGAGAACCCAA(^ 

GAAAAAGAAAAAGTCCGTGACAGTGGCCGCTATGAGAGTGAAAGAGA 

TGGGGCACAATGATTCTGGGACTGGTCAT<^TTTC 

GGGGTGCAAAGGCTCTGATAGACATTGTGCCACTGACAG 

CTCCCACCTCGCCCTGAGACATATCXJCCAGGGAACTGCATCCCGAGT^ 

CTGGGACGCTCCAGCCTCAAGGAACTGCGAAGGG 

GGGGCTCTAGCCTGGAGCAACrrcCAATCreCT 

CGCTACCCTCTGGTGTGTGCATCAGK^ 

ACCAAAGCCAAAAGGAGAGTGGTCCAGAGAGAGAAAAGGCTCACCGA 

TGGAGCTGGAGGAAAACAGAGAGATTCTGAAGGAACCCGTCX^ 

CCAAG(X7VACAATGCCAACATCATGATGCAGAGAGGCAAT^ 

GAGGGGGTCGGCTTCCX^CGTCAGGCCrrCAGGTCCCACTGAGACCTATGA 

TCTTCAAACAGGGACCCAAAGAGCCITTCAGAC^ 

CI^C^GGAAGTCAAAAACTGGGAGAAAATCAGACra 

GTGTGGGCCTCCAGGGAACItXSAAAGGTTTGC^ 

CCGAGTCCGAGCTCGTGAGTCAGATTATCGAAGAGCTC^ 

CATTGAGGTCXSTCCAAAGGGCTTGGAGAGCCATTCTGAATATCrc 

AGGTGGCCCGTCAAGATAATCXATACCGATAACG^ 

GGGCTGATGTGAAACAGCTCACCGAAGTCGTTCM 

CAAGTTCAGACAGCCTATCGCTGCCGCra tctgaattc 
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